git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "William Sprent via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Eric Sunshine" <sunshine@sunshineco.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Victoria Dye" <vdye@github.com>,
	"Elijah Newren" <newren@gmail.com>,
	"William Sprent" <williams@unity3d.com>,
	"William Sprent" <williams@unity3d.com>
Subject: [PATCH v2] ls-tree: add --sparse-filter-oid argument
Date: Mon, 23 Jan 2023 11:46:11 +0000	[thread overview]
Message-ID: <pull.1459.v2.git.1674474371817.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1459.git.1673456518993.gitgitgadget@gmail.com>

From: William Sprent <williams@unity3d.com>

There is currently no way to ask git the question "which files would be
part of a sparse checkout of commit X with sparse checkout patterns Y".
One use-case would be that tooling may want know whether sparse checkouts
of two commits contain the same content even if the full trees differ.
Another interesting use-case would be for tooling to use in conjunction
with 'git update-index --index-info'.

'rev-list --objects --filter=sparse:oid' comes close, but as rev-list is
concerned with objects rather than directory trees, it leaves files out
when the same blob occurs in at two different paths.

It is possible to ask git about the sparse status of files currently in
the index with 'ls-files -t'. However, this does not work well when the
caller is interested in another commit, intererested in sparsity
patterns that aren't currently in '.git/info/sparse-checkout', or when
working in with bare repo.

To fill this gap, add a new argument to ls-tree '--sparse-filter-oid'
which takes the object id of a blob containing sparse checkout patterns
that filters the output of 'ls-tree'. When filtering with given sparsity
patterns, 'ls-tree' only outputs blobs and commit objects that
match the given patterns.

While it may be valid in some situations to output a tree object -- e.g.
when a cone pattern matches all blobs recursively contained in a tree --
it is less unclear what should be output if a sparse pattern matches
parts of a tree.

To allow for reusing the pattern matching logic found in
'path_in_sparse_checkout_1()' in 'dir.c' with arbitrary patterns,
extract the pattern matching part of the function into its own new
function 'recursively_match_path_with_sparse_patterns()'.

Signed-off-by: William Sprent <williams@unity3d.com>
---
    ls-tree: add --sparse-filter-oid argument
    
    I'm resubmitting this change as rebased on top of 'master', as it
    conflicted with the topic 'ls-tree.c: clean-up works' 1
    [https://public-inbox.org/git/20230112091135.20050-1-tenglong.tl@alibaba-inc.com],
    which was merged to 'master' recently.
    
    This versions also incorporates changes based on the comments made in 2
    [https://public-inbox.org/git/CAPig+cRgZ0CrkqY7mufuWmhf6BC8yXjXXuOTEQjuz+Y0NA+N7Q@mail.gmail.com/].
    
    I'm also looping in contributors that have touched ls-tree and/or
    sparse-checkouts recently. I hope that's okay.
    
    William

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1459%2Fwilliams-unity%2Fls-tree-sparse-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1459/williams-unity/ls-tree-sparse-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1459

Contributor requested no range-diff. You can review it using these commands:
   git fetch https://github.com/gitgitgadget/git a38d39a4 07a4e24e
   git range-diff <options> a38d39a4..8891b110 56c8fb1e..07a4e24e

 Documentation/git-ls-tree.txt      |  6 ++
 builtin/ls-tree.c                  | 79 +++++++++++++++++++++++-
 dir.c                              | 45 ++++++++------
 dir.h                              |  5 ++
 t/t3106-ls-tree-sparse-checkout.sh | 99 ++++++++++++++++++++++++++++++
 5 files changed, 213 insertions(+), 21 deletions(-)
 create mode 100755 t/t3106-ls-tree-sparse-checkout.sh

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index 0240adb8eec..724bb1f9dbd 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
 	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]] [--format=<format>]
+	    [--sparse-filter-oid=<blob-ish>]
 	    <tree-ish> [<path>...]
 
 DESCRIPTION
@@ -48,6 +49,11 @@ OPTIONS
 	Show tree entries even when going to recurse them. Has no effect
 	if `-r` was not passed. `-d` implies `-t`.
 
+--sparse-filter-oid=<blob-ish>::
+	Omit showing tree objects and paths that do not match the
+	sparse-checkout specification contained in the blob
+	(or blob-expression) <blob-ish>.
+
 -l::
 --long::
 	Show object size of blob (file) entries.
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 72eb70823d5..46a815fc7dc 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -13,6 +13,7 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "pathspec.h"
+#include "dir.h"
 
 static const char * const ls_tree_usage[] = {
 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
@@ -364,6 +365,65 @@ static struct ls_tree_cmdmode_to_fmt ls_tree_cmdmode_format[] = {
 	},
 };
 
+struct sparse_filter_data {
+	read_tree_fn_t fn;
+	struct pattern_list pl;
+	struct strbuf full_path_buf;
+	struct ls_tree_options *options;
+};
+
+static void init_sparse_filter_data(struct sparse_filter_data **d, struct ls_tree_options *options,
+	const char *sparse_oid_name, read_tree_fn_t fn)
+{
+	struct object_id sparse_oid;
+	struct object_context oc;
+
+	(*d) = xcalloc(1, sizeof(**d));
+	(*d)->fn = fn;
+	(*d)->pl.use_cone_patterns = core_sparse_checkout_cone;
+	(*d)->options = options;
+	strbuf_init(&(*d)->full_path_buf, 0);
+
+	if (get_oid_with_context(the_repository, sparse_oid_name, GET_OID_BLOB,
+			&sparse_oid, &oc))
+		die(_("unable to access sparse blob in '%s'"), sparse_oid_name);
+	if (add_patterns_from_blob_to_list(&sparse_oid, "", 0, &(*d)->pl) < 0)
+		die(_("unable to parse sparse filter data in %s"),
+		    oid_to_hex(&sparse_oid));
+}
+
+static void free_sparse_filter_data(struct sparse_filter_data *d)
+{
+	clear_pattern_list(&d->pl);
+	strbuf_release(&d->full_path_buf);
+	free(d);
+}
+
+static int path_matches_sparse_checkout_patterns(struct strbuf *full_path, struct pattern_list *pl, int dtype)
+{
+	enum pattern_match_result match = recursively_match_path_with_sparse_patterns(full_path->buf, the_repository->index, dtype, pl);
+	return match > 0;
+}
+
+static int filter_sparse(const struct object_id *oid, struct strbuf *base,
+			 const char *pathname, unsigned mode, void *context)
+{
+	struct sparse_filter_data *data = context;
+
+	int do_recurse = show_recursive(data->options, base->buf, base->len, pathname);
+	if (object_type(mode) == OBJ_TREE)
+		return do_recurse;
+
+	strbuf_reset(&data->full_path_buf);
+	strbuf_addbuf(&data->full_path_buf, base);
+	strbuf_addstr(&data->full_path_buf, pathname);
+
+	if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG))
+			return 0;
+
+	return data->fn(oid, base, pathname, mode, data->options);
+}
+
 int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 {
 	struct object_id oid;
@@ -372,7 +432,11 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	read_tree_fn_t fn = NULL;
 	enum ls_tree_cmdmode cmdmode = MODE_DEFAULT;
 	int null_termination = 0;
+
 	struct ls_tree_options options = { 0 };
+	char *sparse_oid_name = NULL;
+	void *read_tree_context = NULL;
+	struct sparse_filter_data *sparse_filter_data = NULL;
 	const struct option ls_tree_options[] = {
 		OPT_BIT('d', NULL, &options.ls_options, N_("only show trees"),
 			LS_TREE_ONLY),
@@ -399,6 +463,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 					 N_("format to use for the output"),
 					 PARSE_OPT_NONEG),
 		OPT__ABBREV(&options.abbrev),
+		OPT_STRING_F(0, "filter-sparse-oid", &sparse_oid_name, N_("filter-sparse-oid"),
+			   N_("filter output with sparse-checkout specification contained in the blob"),
+			     PARSE_OPT_NONEG),
 		OPT_END()
 	};
 	struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format;
@@ -474,7 +541,17 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		break;
 	}
 
-	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, &options);
+	if (sparse_oid_name) {
+		init_sparse_filter_data(&sparse_filter_data, &options, sparse_oid_name, fn);
+		read_tree_context = sparse_filter_data;
+		fn = filter_sparse;
+	} else
+		read_tree_context = &options;
+
+	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, read_tree_context);
 	clear_pathspec(&options.pathspec);
+	if (sparse_filter_data)
+		free_sparse_filter_data(sparse_filter_data);
+
 	return ret;
 }
diff --git a/dir.c b/dir.c
index 4e99f0c868f..122ebced08e 100644
--- a/dir.c
+++ b/dir.c
@@ -1457,45 +1457,50 @@ int init_sparse_checkout_patterns(struct index_state *istate)
 	return 0;
 }
 
-static int path_in_sparse_checkout_1(const char *path,
-				     struct index_state *istate,
-				     int require_cone_mode)
+int recursively_match_path_with_sparse_patterns(const char *path,
+						struct index_state *istate,
+						int dtype,
+						struct pattern_list *pl)
 {
-	int dtype = DT_REG;
 	enum pattern_match_result match = UNDECIDED;
 	const char *end, *slash;
-
-	/*
-	 * We default to accepting a path if the path is empty, there are no
-	 * patterns, or the patterns are of the wrong type.
-	 */
-	if (!*path ||
-	    init_sparse_checkout_patterns(istate) ||
-	    (require_cone_mode &&
-	     !istate->sparse_checkout_patterns->use_cone_patterns))
-		return 1;
-
 	/*
 	 * If UNDECIDED, use the match from the parent dir (recursively), or
 	 * fall back to NOT_MATCHED at the topmost level. Note that cone mode
 	 * never returns UNDECIDED, so we will execute only one iteration in
 	 * this case.
 	 */
-	for (end = path + strlen(path);
-	     end > path && match == UNDECIDED;
+	for (end = path + strlen(path); end > path && match == UNDECIDED;
 	     end = slash) {
-
 		for (slash = end - 1; slash > path && *slash != '/'; slash--)
 			; /* do nothing */
 
 		match = path_matches_pattern_list(path, end - path,
 				slash > path ? slash + 1 : path, &dtype,
-				istate->sparse_checkout_patterns, istate);
+				pl, istate);
 
 		/* We are going to match the parent dir now */
 		dtype = DT_DIR;
 	}
-	return match > 0;
+
+	return match;
+}
+
+static int path_in_sparse_checkout_1(const char *path,
+				     struct index_state *istate,
+				     int require_cone_mode)
+{
+	/*
+	 * We default to accepting a path if the path is empty, there are no
+	 * patterns, or the patterns are of the wrong type.
+	 */
+	if (!*path ||
+	    init_sparse_checkout_patterns(istate) ||
+	    (require_cone_mode &&
+	     !istate->sparse_checkout_patterns->use_cone_patterns))
+		return 1;
+
+	return recursively_match_path_with_sparse_patterns(path, istate, DT_REG, istate->sparse_checkout_patterns) > 0;
 }
 
 int path_in_sparse_checkout(const char *path,
diff --git a/dir.h b/dir.h
index 8acfc044181..d186d271606 100644
--- a/dir.h
+++ b/dir.h
@@ -403,6 +403,11 @@ int path_in_sparse_checkout(const char *path,
 int path_in_cone_mode_sparse_checkout(const char *path,
 				      struct index_state *istate);
 
+int recursively_match_path_with_sparse_patterns(const char *path,
+						struct index_state *istate,
+						int dtype,
+						struct pattern_list *pl);
+
 struct dir_entry *dir_add_ignored(struct dir_struct *dir,
 				  struct index_state *istate,
 				  const char *pathname, int len);
diff --git a/t/t3106-ls-tree-sparse-checkout.sh b/t/t3106-ls-tree-sparse-checkout.sh
new file mode 100755
index 00000000000..945f3ed1888
--- /dev/null
+++ b/t/t3106-ls-tree-sparse-checkout.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='ls-tree with sparse filter patterns'
+
+. ./test-lib.sh
+
+check_agrees_with_ls_files () {
+	REPO=repo  &&
+	git -C $REPO submodule deinit -f --all &&
+	git -C $REPO cat-file -p $filter_oid >"$REPO/.git/info/sparse-checkout" &&
+	git -C $REPO sparse-checkout init --cone &&
+	git -C $REPO submodule init &&
+	git -C $REPO ls-files -t >out &&
+	sed -n "/^S /!s/^. //p" out >expect &&
+	test_cmp expect actual
+}
+
+check_same_result_in_bare_repo () {
+	FILTER_OID=$1 &&
+	git -C repo cat-file -p $FILTER_OID >out &&
+	git -C bare hash-object -w --stdin <out &&
+	git -C bare ls-tree --name-only --filter-sparse-oid=$FILTER_OID -r HEAD >bare-result &&
+	test_cmp expect bare-result
+}
+
+test_expect_success 'setup' '
+	git init submodule &&
+	test_commit -C submodule file &&
+	git init repo &&
+	(
+		cd repo &&
+		mkdir dir &&
+		test_commit dir/sub-file &&
+		test_commit dir/sub-file2 &&
+		mkdir dir2 &&
+		test_commit dir2/sub-file1 &&
+		test_commit dir2/sub-file2 &&
+		test_commit top-file &&
+		git clone ../submodule submodule &&
+		git submodule add ./submodule &&
+		git submodule absorbgitdirs &&
+		git commit -m"add submodule" &&
+		git sparse-checkout init --cone
+	) &&
+	git clone --bare ./repo bare
+'
+
+test_expect_success 'toplevel filter only shows toplevel file' '
+	filter_oid=$(git -C repo hash-object -w --stdin <<-\EOF
+	/*
+	!/*/
+	EOF
+	) &&
+	cat >expect <<-EOF &&
+	.gitmodules
+	submodule
+	top-file.t
+	EOF
+	git -C repo ls-tree --name-only --filter-sparse-oid=$filter_oid -r HEAD >actual &&
+	test_cmp expect actual &&
+	check_agrees_with_ls_files &&
+	check_same_result_in_bare_repo $filter_oid
+'
+
+test_expect_success 'non cone single file filter' '
+	filter_oid=$(git -C repo hash-object -w --stdin <<-\EOF
+	/dir/sub-file.t
+	EOF
+	) &&
+	cat >expect <<-EOF &&
+	dir/sub-file.t
+	EOF
+	git -C repo ls-tree --name-only --filter-sparse-oid=$filter_oid -r HEAD >actual &&
+	test_cmp expect actual &&
+	check_agrees_with_ls_files &&
+	check_same_result_in_bare_repo $filter_oid
+'
+
+test_expect_success 'cone filter matching one dir' '
+	filter_oid=$(git -C repo hash-object -w --stdin <<-\EOF
+	/*
+	!/*/
+	/dir/
+	EOF
+	) &&
+	cat >expect <<-EOF &&
+	.gitmodules
+	dir/sub-file.t
+	dir/sub-file2.t
+	submodule
+	top-file.t
+	EOF
+	git -C repo ls-tree --name-only --filter-sparse-oid=$filter_oid -r HEAD >actual &&
+	test_cmp expect actual &&
+	check_agrees_with_ls_files &&
+	check_same_result_in_bare_repo $filter_oid
+'
+
+test_done

base-commit: 56c8fb1e95377900ec9d53c07886022af0a5d3c2
-- 
gitgitgadget

  parent reply	other threads:[~2023-01-23 11:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 17:01 [PATCH] ls-tree: add --sparse-filter-oid argument William Sprent via GitGitGadget
2023-01-13 14:17 ` Eric Sunshine
2023-01-13 20:01   ` Junio C Hamano
2023-01-16 15:13     ` William Sprent
2023-01-16 12:14   ` William Sprent
2023-01-23 11:46 ` William Sprent via GitGitGadget [this message]
2023-01-23 13:00   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
2023-01-24 15:30     ` William Sprent
2023-01-23 13:06   ` Ævar Arnfjörð Bjarmason
2023-01-24 15:30     ` William Sprent
2023-01-24 20:11   ` Victoria Dye
2023-01-25 13:47     ` William Sprent
2023-01-25 18:32       ` Victoria Dye
2023-01-26 14:55         ` William Sprent
2023-01-25  5:11   ` Elijah Newren
2023-01-25 16:16     ` William Sprent
2023-01-26  3:25       ` Elijah Newren
2023-01-27 11:58         ` William Sprent
2023-01-28 16:45           ` Elijah Newren
2023-01-30 15:28             ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1459.v2.git.1674474371817.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=sunshine@sunshineco.com \
    --cc=vdye@github.com \
    --cc=williams@unity3d.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).