git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
@ 2020-05-07 13:17 Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee

This is based on ds/sparse-allow-empty-working-tree.

BACKGROUND
==========

After discussing the future of sparse-checkout at the Git Contributor
Summit, Minh Tai caught me during the break to introduce a very interesting
idea: in-tree sparse-checkout definitions.

Here is the context: I was talking about how the Office team is building a
tool to update the sparse-checkout definition dynamically as their build
definitions change. They have a very compartmentalized build system, and
certain "projects" depend on each other. To ensure that builds do not break
as dependencies change, the plan was to have this tool run as a hook when
HEAD changes. This is a bit expensive to do, and we were discussing how much
we need this to be automated versus a part of the build process.

This kind of idea would need to be replicated for every team that wants to
use sparse-checkout in this way. That doesn't bode well for wide adoption of
the feature.

IN-TREE SPARSE-CHECKOUT DEFINITIONS
===================================

Minh's idea was simple: have sparse-checkout files in the working directory
and use config to point to them. As these in-tree files update, we can
automatically update the sparse-checkout definition accordingly. Now, the
only thing to do would be to ensure that the sparse-checkout files are
updated when someone updates the build definitions. This requires some extra
build validation, but would not require special tools built on every client.

As soon as I sat down to implement this idea, I discovered some
complications to that basic idea. Each of these sparse-checkout definitions
should fit a "role" on the team. To continue with the Office analogy,
suppose we had a definition for developers working on each of Word,
PowerPoint, and Excel. What happens when someone wants to port a feature
from Excel to Word? That user would want to take the union of the two
sparse-checkout definitions. But what does that mean when we are working
with arbitrary sparse-checkout files?

This leads to the restriction that I built into this feature: we only care
about "cone mode" definitions. These rely on directory-based prefix matches
that are very fast. With this restriction, it is simple to understand what
the "union" operation should do: take all directories that would be included
by either.

Once we restrict to cone mode, there is no reason to continue storing files
using the sparse-checkout file format. The list of patterns is larger than
the input to "git sparse-checkout set" or output from "git sparse-checkout
list" in cone mode, and more prone to user error. Thus, let's be simpler and
use a list of directories to specify the sparse-checkout definition.

This leads to the second issue that complicates things. If a build
dependency changes to a core library, but there are N "roles" that depend on
that library, then the person changing that library also needs to change and
validate N sparse-checkout definitions! This does not scale well. So, let's
take a page from build dependencies and allow these in-tree sparse-checkout
definitions to depend on each other!

The end result is a mechanism that should be flexible enough to future needs
but simple enough to build using existing Git features. The format re-uses
the config file format. This provides good re-use of code as well as being
something easy to extend in the future. In Patch 4, I tried to describe
several alternatives and why this is the best of those options. I'm open to
suggestions here.

OUTLINE
=======

 * Patch 1 is a bugfix that should work on its own. I caught it in tests
   after Patch 5.
   
   
 * Patches 2-7 implement the feature.
   
   
 * Patches 8-9 make the Git build system a bit more robust to missing
   directories.
   
   
 * Patch 10 adds a .sparse directory to Git as well as two in-tree
   sparse-checkout definitions. One is for a bare-bones Linux developer and
   the other adds the compat code for Windows on top.
   
   

As mentioned in patch 10, this allows the following workflow for Git
contributors that want to "eat our own dogfood" with the partial clone and
sparse-checkout features:

 $ git clone --sparse --filter=blob:none https://github.com/git/git$ cd git
$ git sparse-checkout set --in-tree .sparse/base.deps

Then, we have a smaller working directory but can still build and test the
code.

FUTURE DIRECTIONS
=================

There are definitely ways to make this feature more seamless, or more useful
to us.

 * It would be nice to extend git clone to have a string value to 
   --sparse=<file> and possibly have it multi-valued. This would initialize
   the in-tree sparse-checkout definition immediately upon cloning.
   
   
 * I think there are some ways we could reorganize the Git codebase to make
   our "most basic" sparse-checkout file have even fewer files. Mostly, it
   would require modifying the build to ask "do these files exist on-disk?
   if not, then disable this build option." This would also require adding
   test dependencies that also disable some tests when those do not exist.
   
   
 * Currently, if we have an in-tree sparse-checkout definition, then we
   can't add more directories to it manually. If someone runs git
   sparse-checkout set --in-tree <file> followed by git sparse-checkout add
   <dir>, then will disappear as it is overridden by the in-tree definition.
   The only way I can think to get around this would be to add these
   directories through Git config, since we were using the sparse-checkout
   file as the storage of "what directories do I care about?" but it is no
   longer the source of truth in this situation. I'm open to ideas here, and
   it would be nice to have this interaction hardened before moving the
   series out of RFC status.
   
   

Thanks, -Stolee

Derrick Stolee (10):
  unpack-trees: avoid array out-of-bounds error
  sparse-checkout: move code from builtin
  sparse-checkout: move code from unpack-trees.c
  sparse-checkout: allow in-tree definitions
  sparse-checkout: automatically update in-tree definition
  sparse-checkout: use oidset to prevent repeat blobs
  sparse-checkout: define in-tree dependencies
  Makefile: skip git-gui if dir is missing
  Makefile: disable GETTEXT when 'po' is missing
  .sparse: add in-tree sparse-checkout for Git

 .sparse/base.deps                     |  19 ++
 .sparse/windows.deps                  |   3 +
 Documentation/config.txt              |   2 +
 Documentation/config/sparse.txt       |  15 +
 Documentation/git-sparse-checkout.txt |  70 ++++-
 Makefile                              |  19 +-
 builtin/commit.c                      |   4 +-
 builtin/read-tree.c                   |   4 +
 builtin/sparse-checkout.c             | 268 +++--------------
 read-cache.c                          |   8 +-
 sparse-checkout.c                     | 418 ++++++++++++++++++++++++++
 sparse-checkout.h                     |  30 ++
 t/lib-gettext.sh                      |   1 -
 t/t0200-gettext-basic.sh              |   6 +-
 t/t1091-sparse-checkout-builtin.sh    | 143 +++++++++
 t/test-lib.sh                         |   7 +-
 unpack-trees.c                        |  10 +-
 17 files changed, 785 insertions(+), 242 deletions(-)
 create mode 100644 .sparse/base.deps
 create mode 100644 .sparse/windows.deps
 create mode 100644 Documentation/config/sparse.txt
 create mode 100644 sparse-checkout.c
 create mode 100644 sparse-checkout.h


base-commit: ace224ac5fb120e9cae894e31713ab60e91f141f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-627%2Fderrickstolee%2Fsparse-checkout-in-tree-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-627/derrickstolee/sparse-checkout-in-tree-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/627
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/10] unpack-trees: avoid array out-of-bounds error
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-07 22:27   ` Junio C Hamano
  2020-05-07 13:17 ` [PATCH 02/10] sparse-checkout: move code from builtin Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The loop in warn_conflicted_path() that checks for the count of entries
with the same path uses "i+count" for the array entry. However, the loop
only verifies that the value of count is below the array size. Fix this
by adding i to the condition.

I hit this condition during a test of the in-tree sparse-checkout
feature, so it is exercised by the end of the series.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 9a3ccd9d083..4f880f2da90 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -563,10 +563,11 @@ static int warn_conflicted_path(struct index_state *istate,
 	add_rejected_path(o, WARNING_SPARSE_UNMERGED_FILE, conflicting_path);
 
 	/* Find out how many higher stage entries at same path */
-	while (++count < istate->cache_nr &&
+	while (i + ++count < istate->cache_nr &&
 	       !strcmp(conflicting_path,
 		       istate->cache[i+count]->name))
 		/* do nothing */;
+
 	return count;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/10] sparse-checkout: move code from builtin
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 03/10] sparse-checkout: move code from unpack-trees.c Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of some new features in the sparse-checkout space,
create a new sparse-checkout.c file at root to store the implementation
of these features. Several helper methods from builtin/sparse-checkout.c
are moved to this file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile                  |   1 +
 builtin/sparse-checkout.c | 225 +-----------------------------------
 sparse-checkout.c         | 233 ++++++++++++++++++++++++++++++++++++++
 sparse-checkout.h         |  16 +++
 4 files changed, 251 insertions(+), 224 deletions(-)
 create mode 100644 sparse-checkout.c
 create mode 100644 sparse-checkout.h

diff --git a/Makefile b/Makefile
index 9804a0758b2..2e8029b8026 100644
--- a/Makefile
+++ b/Makefile
@@ -985,6 +985,7 @@ LIB_OBJS += sha1-name.o
 LIB_OBJS += shallow.o
 LIB_OBJS += sideband.o
 LIB_OBJS += sigchain.o
+LIB_OBJS += sparse-checkout.o
 LIB_OBJS += split-index.o
 LIB_OBJS += stable-qsort.o
 LIB_OBJS += strbuf.o
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 95d08824172..fd247e428e4 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -9,11 +9,11 @@
 #include "string-list.h"
 #include "cache.h"
 #include "cache-tree.h"
-#include "lockfile.h"
 #include "resolve-undo.h"
 #include "unpack-trees.h"
 #include "wt-status.h"
 #include "quote.h"
+#include "sparse-checkout.h"
 
 static const char *empty_base = "";
 
@@ -22,30 +22,6 @@ static char const * const builtin_sparse_checkout_usage[] = {
 	NULL
 };
 
-static char *get_sparse_checkout_filename(void)
-{
-	return git_pathdup("info/sparse-checkout");
-}
-
-static void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
-{
-	int i;
-
-	for (i = 0; i < pl->nr; i++) {
-		struct path_pattern *p = pl->patterns[i];
-
-		if (p->flags & PATTERN_FLAG_NEGATIVE)
-			fprintf(fp, "!");
-
-		fprintf(fp, "%s", p->pattern);
-
-		if (p->flags & PATTERN_FLAG_MUSTBEDIR)
-			fprintf(fp, "/");
-
-		fprintf(fp, "\n");
-	}
-}
-
 static int sparse_checkout_list(int argc, const char **argv)
 {
 	struct pattern_list pl;
@@ -92,153 +68,6 @@ static int sparse_checkout_list(int argc, const char **argv)
 	return 0;
 }
 
-static int update_working_directory(struct pattern_list *pl)
-{
-	enum update_sparsity_result result;
-	struct unpack_trees_options o;
-	struct lock_file lock_file = LOCK_INIT;
-	struct repository *r = the_repository;
-
-	memset(&o, 0, sizeof(o));
-	o.verbose_update = isatty(2);
-	o.update = 1;
-	o.head_idx = -1;
-	o.src_index = r->index;
-	o.dst_index = r->index;
-	o.skip_sparse_checkout = 0;
-	o.pl = pl;
-
-	setup_work_tree();
-
-	repo_hold_locked_index(r, &lock_file, LOCK_DIE_ON_ERROR);
-
-	setup_unpack_trees_porcelain(&o, "sparse-checkout");
-	result = update_sparsity(&o);
-	clear_unpack_trees_porcelain(&o);
-
-	if (result == UPDATE_SPARSITY_WARNINGS)
-		/*
-		 * We don't do any special handling of warnings from untracked
-		 * files in the way or dirty entries that can't be removed.
-		 */
-		result = UPDATE_SPARSITY_SUCCESS;
-	if (result == UPDATE_SPARSITY_SUCCESS)
-		write_locked_index(r->index, &lock_file, COMMIT_LOCK);
-	else
-		rollback_lock_file(&lock_file);
-
-	return result;
-}
-
-static char *escaped_pattern(char *pattern)
-{
-	char *p = pattern;
-	struct strbuf final = STRBUF_INIT;
-
-	while (*p) {
-		if (is_glob_special(*p))
-			strbuf_addch(&final, '\\');
-
-		strbuf_addch(&final, *p);
-		p++;
-	}
-
-	return strbuf_detach(&final, NULL);
-}
-
-static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
-{
-	int i;
-	struct pattern_entry *pe;
-	struct hashmap_iter iter;
-	struct string_list sl = STRING_LIST_INIT_DUP;
-	struct strbuf parent_pattern = STRBUF_INIT;
-
-	hashmap_for_each_entry(&pl->parent_hashmap, &iter, pe, ent) {
-		if (hashmap_get_entry(&pl->recursive_hashmap, pe, ent, NULL))
-			continue;
-
-		if (!hashmap_contains_parent(&pl->recursive_hashmap,
-					     pe->pattern,
-					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
-	}
-
-	string_list_sort(&sl);
-	string_list_remove_duplicates(&sl, 0);
-
-	fprintf(fp, "/*\n!/*/\n");
-
-	for (i = 0; i < sl.nr; i++) {
-		char *pattern = escaped_pattern(sl.items[i].string);
-
-		if (strlen(pattern))
-			fprintf(fp, "%s/\n!%s/*/\n", pattern, pattern);
-		free(pattern);
-	}
-
-	string_list_clear(&sl, 0);
-
-	hashmap_for_each_entry(&pl->recursive_hashmap, &iter, pe, ent) {
-		if (!hashmap_contains_parent(&pl->recursive_hashmap,
-					     pe->pattern,
-					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
-	}
-
-	strbuf_release(&parent_pattern);
-
-	string_list_sort(&sl);
-	string_list_remove_duplicates(&sl, 0);
-
-	for (i = 0; i < sl.nr; i++) {
-		char *pattern = escaped_pattern(sl.items[i].string);
-		fprintf(fp, "%s/\n", pattern);
-		free(pattern);
-	}
-}
-
-static int write_patterns_and_update(struct pattern_list *pl)
-{
-	char *sparse_filename;
-	FILE *fp;
-	int fd;
-	struct lock_file lk = LOCK_INIT;
-	int result;
-
-	sparse_filename = get_sparse_checkout_filename();
-
-	if (safe_create_leading_directories(sparse_filename))
-		die(_("failed to create directory for sparse-checkout file"));
-
-	fd = hold_lock_file_for_update(&lk, sparse_filename,
-				      LOCK_DIE_ON_ERROR);
-
-	result = update_working_directory(pl);
-	if (result) {
-		rollback_lock_file(&lk);
-		free(sparse_filename);
-		clear_pattern_list(pl);
-		update_working_directory(NULL);
-		return result;
-	}
-
-	fp = xfdopen(fd, "w");
-
-	if (core_sparse_checkout_cone)
-		write_cone_to_file(fp, pl);
-	else
-		write_patterns_to_file(fp, pl);
-
-	fflush(fp);
-	commit_lock_file(&lk);
-
-	free(sparse_filename);
-	clear_pattern_list(pl);
-
-	return 0;
-}
-
 enum sparse_checkout_mode {
 	MODE_NO_PATTERNS = 0,
 	MODE_ALL_PATTERNS = 1,
@@ -339,58 +168,6 @@ static int sparse_checkout_init(int argc, const char **argv)
 	return write_patterns_and_update(&pl);
 }
 
-static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path)
-{
-	struct pattern_entry *e = xmalloc(sizeof(*e));
-	e->patternlen = path->len;
-	e->pattern = strbuf_detach(path, NULL);
-	hashmap_entry_init(&e->ent,
-			   ignore_case ?
-			   strihash(e->pattern) :
-			   strhash(e->pattern));
-
-	hashmap_add(&pl->recursive_hashmap, &e->ent);
-
-	while (e->patternlen) {
-		char *slash = strrchr(e->pattern, '/');
-		char *oldpattern = e->pattern;
-		size_t newlen;
-
-		if (slash == e->pattern)
-			break;
-
-		newlen = slash - e->pattern;
-		e = xmalloc(sizeof(struct pattern_entry));
-		e->patternlen = newlen;
-		e->pattern = xstrndup(oldpattern, newlen);
-		hashmap_entry_init(&e->ent,
-				   ignore_case ?
-				   strihash(e->pattern) :
-				   strhash(e->pattern));
-
-		if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
-			hashmap_add(&pl->parent_hashmap, &e->ent);
-	}
-}
-
-static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
-{
-	strbuf_trim(line);
-
-	strbuf_trim_trailing_dir_sep(line);
-
-	if (strbuf_normalize_path(line))
-		die(_("could not normalize path %s"), line->buf);
-
-	if (!line->len)
-		return;
-
-	if (line->buf[0] != '/')
-		strbuf_insertstr(line, 0, "/");
-
-	insert_recursive_pattern(pl, line);
-}
-
 static char const * const builtin_sparse_checkout_set_usage[] = {
 	N_("git sparse-checkout (set|add) (--stdin | <patterns>)"),
 	NULL
diff --git a/sparse-checkout.c b/sparse-checkout.c
new file mode 100644
index 00000000000..5ada307b42c
--- /dev/null
+++ b/sparse-checkout.c
@@ -0,0 +1,233 @@
+#include "cache.h"
+#include "config.h"
+#include "dir.h"
+#include "lockfile.h"
+#include "quote.h"
+#include "repository.h"
+#include "sparse-checkout.h"
+#include "strbuf.h"
+#include "string-list.h"
+#include "unpack-trees.h"
+
+char *get_sparse_checkout_filename(void)
+{
+	return git_pathdup("info/sparse-checkout");
+}
+
+void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
+{
+	int i;
+
+	for (i = 0; i < pl->nr; i++) {
+		struct path_pattern *p = pl->patterns[i];
+
+		if (p->flags & PATTERN_FLAG_NEGATIVE)
+			fprintf(fp, "!");
+
+		fprintf(fp, "%s", p->pattern);
+
+		if (p->flags & PATTERN_FLAG_MUSTBEDIR)
+			fprintf(fp, "/");
+
+		fprintf(fp, "\n");
+	}
+}
+
+int update_working_directory(struct pattern_list *pl)
+{
+	enum update_sparsity_result result;
+	struct unpack_trees_options o;
+	struct lock_file lock_file = LOCK_INIT;
+	struct repository *r = the_repository;
+
+	memset(&o, 0, sizeof(o));
+	o.verbose_update = isatty(2);
+	o.update = 1;
+	o.head_idx = -1;
+	o.src_index = r->index;
+	o.dst_index = r->index;
+	o.skip_sparse_checkout = 0;
+	o.pl = pl;
+
+	setup_work_tree();
+
+	repo_hold_locked_index(r, &lock_file, LOCK_DIE_ON_ERROR);
+
+	setup_unpack_trees_porcelain(&o, "sparse-checkout");
+	result = update_sparsity(&o);
+	clear_unpack_trees_porcelain(&o);
+
+	if (result == UPDATE_SPARSITY_WARNINGS)
+		/*
+		 * We don't do any special handling of warnings from untracked
+		 * files in the way or dirty entries that can't be removed.
+		 */
+		result = UPDATE_SPARSITY_SUCCESS;
+	if (result == UPDATE_SPARSITY_SUCCESS)
+		write_locked_index(r->index, &lock_file, COMMIT_LOCK);
+	else
+		rollback_lock_file(&lock_file);
+
+	return result;
+}
+
+static char *escaped_pattern(char *pattern)
+{
+	char *p = pattern;
+	struct strbuf final = STRBUF_INIT;
+
+	while (*p) {
+		if (is_glob_special(*p))
+			strbuf_addch(&final, '\\');
+
+		strbuf_addch(&final, *p);
+		p++;
+	}
+
+	return strbuf_detach(&final, NULL);
+}
+
+static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
+{
+	int i;
+	struct pattern_entry *pe;
+	struct hashmap_iter iter;
+	struct string_list sl = STRING_LIST_INIT_DUP;
+	struct strbuf parent_pattern = STRBUF_INIT;
+
+	hashmap_for_each_entry(&pl->parent_hashmap, &iter, pe, ent) {
+		if (hashmap_get_entry(&pl->recursive_hashmap, pe, ent, NULL))
+			continue;
+
+		if (!hashmap_contains_parent(&pl->recursive_hashmap,
+					     pe->pattern,
+					     &parent_pattern))
+			string_list_insert(&sl, pe->pattern);
+	}
+
+	string_list_sort(&sl);
+	string_list_remove_duplicates(&sl, 0);
+
+	fprintf(fp, "/*\n!/*/\n");
+
+	for (i = 0; i < sl.nr; i++) {
+		char *pattern = escaped_pattern(sl.items[i].string);
+
+		if (strlen(pattern))
+			fprintf(fp, "%s/\n!%s/*/\n", pattern, pattern);
+		free(pattern);
+	}
+
+	string_list_clear(&sl, 0);
+
+	hashmap_for_each_entry(&pl->recursive_hashmap, &iter, pe, ent) {
+		if (!hashmap_contains_parent(&pl->recursive_hashmap,
+					     pe->pattern,
+					     &parent_pattern))
+			string_list_insert(&sl, pe->pattern);
+	}
+
+	strbuf_release(&parent_pattern);
+
+	string_list_sort(&sl);
+	string_list_remove_duplicates(&sl, 0);
+
+	for (i = 0; i < sl.nr; i++) {
+		char *pattern = escaped_pattern(sl.items[i].string);
+		fprintf(fp, "%s/\n", pattern);
+		free(pattern);
+	}
+}
+
+int write_patterns_and_update(struct pattern_list *pl)
+{
+	char *sparse_filename;
+	FILE *fp;
+	int fd;
+	struct lock_file lk = LOCK_INIT;
+	int result;
+
+	sparse_filename = get_sparse_checkout_filename();
+
+	if (safe_create_leading_directories(sparse_filename))
+		die(_("failed to create directory for sparse-checkout file"));
+
+	fd = hold_lock_file_for_update(&lk, sparse_filename,
+				      LOCK_DIE_ON_ERROR);
+
+	result = update_working_directory(pl);
+	if (result) {
+		rollback_lock_file(&lk);
+		free(sparse_filename);
+		clear_pattern_list(pl);
+		update_working_directory(NULL);
+		return result;
+	}
+
+	fp = xfdopen(fd, "w");
+
+	if (core_sparse_checkout_cone)
+		write_cone_to_file(fp, pl);
+	else
+		write_patterns_to_file(fp, pl);
+
+	fflush(fp);
+	commit_lock_file(&lk);
+
+	free(sparse_filename);
+	clear_pattern_list(pl);
+
+	return 0;
+}
+
+void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path)
+{
+	struct pattern_entry *e = xmalloc(sizeof(*e));
+	e->patternlen = path->len;
+	e->pattern = strbuf_detach(path, NULL);
+	hashmap_entry_init(&e->ent,
+			   ignore_case ?
+			   strihash(e->pattern) :
+			   strhash(e->pattern));
+
+	hashmap_add(&pl->recursive_hashmap, &e->ent);
+
+	while (e->patternlen) {
+		char *slash = strrchr(e->pattern, '/');
+		char *oldpattern = e->pattern;
+		size_t newlen;
+
+		if (slash == e->pattern)
+			break;
+
+		newlen = slash - e->pattern;
+		e = xmalloc(sizeof(struct pattern_entry));
+		e->patternlen = newlen;
+		e->pattern = xstrndup(oldpattern, newlen);
+		hashmap_entry_init(&e->ent,
+				   ignore_case ?
+				   strihash(e->pattern) :
+				   strhash(e->pattern));
+
+		if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
+			hashmap_add(&pl->parent_hashmap, &e->ent);
+	}
+}
+
+void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
+{
+	strbuf_trim(line);
+
+	strbuf_trim_trailing_dir_sep(line);
+
+	if (strbuf_normalize_path(line))
+		die(_("could not normalize path %s"), line->buf);
+
+	if (!line->len)
+		return;
+
+	if (line->buf[0] != '/')
+		strbuf_insertstr(line, 0, "/");
+
+	insert_recursive_pattern(pl, line);
+}
diff --git a/sparse-checkout.h b/sparse-checkout.h
new file mode 100644
index 00000000000..4148832760d
--- /dev/null
+++ b/sparse-checkout.h
@@ -0,0 +1,16 @@
+#ifndef SPARSE_CHECKOUT_H
+#define SPARSE_CHECKOUT_H
+
+#include "cache.h"
+#include "repository.h"
+
+struct pattern_list;
+
+char *get_sparse_checkout_filename(void);
+void write_patterns_to_file(FILE *fp, struct pattern_list *pl);
+int update_working_directory(struct pattern_list *pl);
+int write_patterns_and_update(struct pattern_list *pl);
+void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path);
+void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl);
+
+#endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/10] sparse-checkout: move code from unpack-trees.c
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 02/10] sparse-checkout: move code from builtin Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

There is no need to have two ways to load the sparse-checkout patterns,
especially duplicating the info/sparse-checkout filename. Move this code
out of unpack-trees.c and into sparse-checkout.c.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-checkout.c | 12 ++++++++++++
 sparse-checkout.h |  1 +
 unpack-trees.c    |  7 ++-----
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/sparse-checkout.c b/sparse-checkout.c
index 5ada307b42c..875b620568d 100644
--- a/sparse-checkout.c
+++ b/sparse-checkout.c
@@ -33,6 +33,18 @@ void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
 	}
 }
 
+int populate_sparse_checkout_patterns(struct pattern_list *pl)
+{
+	int result;
+	char *sparse = get_sparse_checkout_filename();
+
+	pl->use_cone_patterns = core_sparse_checkout_cone;
+	result = add_patterns_from_file_to_list(sparse, "", 0, pl, NULL);
+	free(sparse);
+
+	return result;
+}
+
 int update_working_directory(struct pattern_list *pl)
 {
 	enum update_sparsity_result result;
diff --git a/sparse-checkout.h b/sparse-checkout.h
index 4148832760d..e0c840f07f9 100644
--- a/sparse-checkout.h
+++ b/sparse-checkout.h
@@ -7,6 +7,7 @@
 struct pattern_list;
 
 char *get_sparse_checkout_filename(void);
+int populate_sparse_checkout_patterns(struct pattern_list *pl);
 void write_patterns_to_file(FILE *fp, struct pattern_list *pl);
 int update_working_directory(struct pattern_list *pl);
 int write_patterns_and_update(struct pattern_list *pl);
diff --git a/unpack-trees.c b/unpack-trees.c
index 4f880f2da90..6f9bdeb57a0 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -16,6 +16,7 @@
 #include "fsmonitor.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "sparse-checkout.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -1550,14 +1551,10 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
 static void populate_from_existing_patterns(struct unpack_trees_options *o,
 					    struct pattern_list *pl)
 {
-	char *sparse = git_pathdup("info/sparse-checkout");
-
-	pl->use_cone_patterns = core_sparse_checkout_cone;
-	if (add_patterns_from_file_to_list(sparse, "", 0, pl, NULL) < 0)
+	if (populate_sparse_checkout_patterns(pl) < 0)
 		o->skip_sparse_checkout = 1;
 	else
 		o->pl = pl;
-	free(sparse);
 }
 
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/10] sparse-checkout: allow in-tree definitions
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 03/10] sparse-checkout: move code from unpack-trees.c Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-07 22:58   ` Junio C Hamano
  2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

One of the difficulties of using the sparse-checkout feature is not
knowing which directories are absolutely needed for working in a portion
of the repository. Some of this can be documented in README files or
included in a bootstrapping tool along with the repository. This is done
in an ad-hoc way by every project that wants to use it.

Let's make this process easier for users by creating a way to define a
useful sparse-checkout definition inside the Git tree data. This has
several benefits. In particular, the data is available to anyone who has
a copy of the repository without needing a different data source.
Second, the needs of the repository can change over time and Git can
present a way to automatically update the working directory as these
sparse-checkout definitions change over time.

When sitting down to design this feature, there were several possible
options.

The first option is to literally include files in the repository that
could replace the sparse-checkout file. This presents full generality
for the repository, but the sparse-checkout patterns may look strange to
a non-expert. While the general case could be useful to some, we are
actively working to make "cone mode" the generally accepted way to work
with sparse-checkout. In cone mode, the user selects a set of
directories to include and all matches are directory-based prefix
matches. This is much faster but also much easier to understand.

Another option would be to interpret a file as a list of necessary
directories. This would match the input to 'git sparse-checkout set'
and output of 'git sparse-checkout list' when in cone mode. However,
there are subtleties around custom data parsers that may not make this
the safest approach.

Both of the above options also suffer from a drawback that the file
format is too simple to allow for extensions in the future. For example,
in a repository with many dependent parts, changing a necessary
directory in a core component would require updating every in-tree
sparse-checkout definition. Since those files are likely very long, this
presents a lot of noise for users to handle as they update these
dependencies.

Instead, let's create a format that can easily be extended to satisfy
this need and any other needs we may want to add in the future. For this
reason, I selected the config file format for these in-tree
sparse-checkout definitions.

To use this feature, add the "--in-tree" option when setting or adding
directories to the sparse-checkout definition. For example:

  $ git sparse-checkout set --in-tree .sparse/base
  $ git sparse-checkout add --in-tree .sparse/extra

These commands add values to the multi-valued config setting
"sparse.inTree". When updating the sparse-checkout definition, these
values describe paths in the repository to find the sparse-checkout
data. After the commands listed earlier, we expect to see the following
in .git/config.worktree:

	[sparse]
		intree = .sparse/base
		intree = .sparse/extra

When applying the sparse-checkout definitions from this config, the
blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. In those
files, the multi-valued config values "sparse.dir" are considered as
the directories to construct a cone mode sparse-checkout file. The end
result is as if these paths were provided to "git sparse-checkout set"
in cone mode.

For example, suppose .sparse/base had the following content:

	[sparse]
		dir = A
		dir = B/C
		dir = D/E/F

and .sparse/extra had the following content:

	[sparse]
		dir = D
		dir = X

Then, the output of "git sparse-checkout list" would be

	A
	B/C
	D
	X

Note that since "D" contains "D/E/F", that directory replaces the
position of "D/E/F" in the list.

Since these are parsed using the config library, the parser is robust
enough to understand comments and complicated string values.

The key benefit to this approach is that it can be extended by defining
new config values. In a later change, we will introduce "sparse.inherit"
to point to another file in the tree. This will solve the problem of
editing many files when core dependencies change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt              |   2 +
 Documentation/config/sparse.txt       |  15 +++
 Documentation/git-sparse-checkout.txt |  43 ++++++++-
 builtin/sparse-checkout.c             |  43 +++++++++
 sparse-checkout.c                     | 128 +++++++++++++++++++++++++-
 sparse-checkout.h                     |  11 +++
 t/t1091-sparse-checkout-builtin.sh    | 101 ++++++++++++++++++++
 7 files changed, 337 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/config/sparse.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 08b13ba72be..40f44948229 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -435,6 +435,8 @@ include::config/sequencer.txt[]
 
 include::config/showbranch.txt[]
 
+include::config/sparse.txt[]
+
 include::config/splitindex.txt[]
 
 include::config/ssh.txt[]
diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
new file mode 100644
index 00000000000..c1fce87cd33
--- /dev/null
+++ b/Documentation/config/sparse.txt
@@ -0,0 +1,15 @@
+sparse.inTree::
+	The `core.sparseCheckout` config option enables the `sparse-checkout`
+	feature, but if there are any values for the multi-valued
+	`sparse.inTree` config option, then the sparse-checkout patterns are
+	defined by parsing the files listed in these values. See
+	linkgit:git-sparse-checkout[1] for more information.
+
+sparse.dir::
+	This config setting is ignored if present in the repository config.
+	Instead, this multi-valued option is present in the files listed by
+	`sparse.inTree` and specifies the directories needed in the
+	working directory. The union of all `sparse.dir` values across all
+	`sparse.inTree` files forms the input for `git sparse-checkout set`
+	in cone mode.  See linkgit:git-sparse-checkout[1] for more
+	information.
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 1a3ace60820..da9322c5e41 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -62,13 +62,20 @@ directories (recursively) as well as files that are siblings of ancestor
 directories. The input format matches the output of `git ls-tree --name-only`.
 This includes interpreting pathnames that begin with a double quote (") as
 C-style quoted strings.
++
+When the `--in-tree` option is provided, the paths provided are interpreted
+as files within the working directory that are used to construct the
+`sparse-checkout` patterns. See 'IN-TREE PATTERN SET' below.
 
 'add'::
 	Update the sparse-checkout file to include additional patterns.
 	By default, these patterns are read from the command-line arguments,
 	but they can be read from stdin using the `--stdin` option. When
 	`core.sparseCheckoutCone` is enabled, the given patterns are interpreted
-	as directory names as in the 'set' subcommand.
+	as directory names as in the 'set' subcommand. When the `--in-tree`
+	option is provided, the input is interpreted as locations of files
+	describing a sparse-checkout definition as in the 'set' subcommand
+	and the 'IN-TREE PATTERN SET' section below.
 
 'reapply::
 	Reapply the sparsity pattern rules to paths in the working tree.
@@ -197,6 +204,40 @@ case-insensitive check. This corrects for case mismatched filenames in the
 directory.
 
 
+IN-TREE PATTERN SET
+-------------------
+
+As your project changes, your sparse-checkout pattern sets may also change.
+It is important to be able to construct a valid sparse-checkout pattern set
+when switching between points in history. The in-tree pattern sets allow
+versioning cone-mode sparse-checkout patterns next to your other artifacts.
+
+To enable the feature, create a sparse-checkout definition using the Git
+config format. The file should specify the multi-valued config variable
+`sparse.dir` to a list of directories to include in the sparse-checkout
+definition. If multiple files are specified, the resulting sparse-checkout
+definition is the union of all directories from all such files. For
+example, the following file contains a list of three directories, `A`,
+`B/C`, and `D/E/F`:
+
+----------------------------------
+[sparse]
+	dir = A
+	dir = B/C
+# Comments are allowed to describe
+# why a directory is necessary
+	dir = D/E/F
+----------------------------------
+
+Use `git sparse-checkout set --in-tree <path>` to initialize the patterns
+to those included in the file at `<path>`. This will override any existing
+patterns you have in your sparse-checkout file.
+
+After switching between commits with different versions of this file, run
+`git sparse-checkout reapply` to adjust the sparse-checkout patterns to
+the new definition.
+
+
 SUBMODULES
 ----------
 
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index fd247e428e4..621f1801c03 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -175,6 +175,7 @@ static char const * const builtin_sparse_checkout_set_usage[] = {
 
 static struct sparse_checkout_set_opts {
 	int use_stdin;
+	int in_tree;
 } set_opts;
 
 static void add_patterns_from_input(struct pattern_list *pl,
@@ -312,12 +313,52 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
 	return result;
 }
 
+static int modify_in_tree_list(int argc, const char **argv, enum modify_type m)
+{
+	int result = 0;
+	int i;
+	struct string_list sl = STRING_LIST_INIT_DUP;
+	struct pattern_list pl;
+
+	memset(&pl, 0, sizeof(pl));
+
+	switch(m) {
+	case ADD:
+		if (load_in_tree_list_from_config(the_repository, &sl))
+			return 1;
+		if (!sl.nr)
+			warning(_("the existing in-tree config has no entries; this overwrites the existing sparse-checkout definition."));
+		populate_sparse_checkout_patterns(&pl);
+		break;
+
+	case REPLACE:
+		hashmap_init(&pl.recursive_hashmap, pl_hashmap_cmp, NULL, 0);
+		hashmap_init(&pl.parent_hashmap, pl_hashmap_cmp, NULL, 0);
+		break;
+	}
+
+	for (i = 0; i < argc; i++)
+		string_list_insert(&sl, argv[i]);
+
+	if (load_in_tree_pattern_list(the_repository, &sl, &pl) ||
+	    set_sparse_in_tree_config(the_repository, &sl) ||
+	    write_patterns_and_update(&pl))
+		result = 1;
+
+	string_list_clear(&sl, 0);
+	clear_pattern_list(&pl);
+
+	return result;
+}
+
 static int sparse_checkout_set(int argc, const char **argv, const char *prefix,
 			       enum modify_type m)
 {
 	static struct option builtin_sparse_checkout_set_options[] = {
 		OPT_BOOL(0, "stdin", &set_opts.use_stdin,
 			 N_("read patterns from standard in")),
+		OPT_BOOL(0, "in-tree", &set_opts.in_tree,
+			 N_("define the sparse-checkout from files in the tree")),
 		OPT_END(),
 	};
 
@@ -328,6 +369,8 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix,
 			     builtin_sparse_checkout_set_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
 
+	if (set_opts.in_tree)
+		return modify_in_tree_list(argc, argv, m);
 	return modify_pattern_list(argc, argv, m);
 }
 
diff --git a/sparse-checkout.c b/sparse-checkout.c
index 875b620568d..d6c27ca19c4 100644
--- a/sparse-checkout.c
+++ b/sparse-checkout.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "unpack-trees.h"
+#include "object-store.h"
 
 char *get_sparse_checkout_filename(void)
 {
@@ -33,14 +34,113 @@ void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
 	}
 }
 
+int load_in_tree_list_from_config(struct repository *r,
+				  struct string_list *sl)
+{
+	struct string_list_item *item;
+	const struct string_list *cl;
+
+	cl = repo_config_get_value_multi(r, SPARSE_CHECKOUT_IN_TREE);
+
+	if (!cl)
+		return 1;
+
+	for_each_string_list_item(item, cl)
+		string_list_insert(sl, item->string);
+
+	return 0;
+}
+
+static int sparse_dir_cb(const char *var, const char *value, void *data)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct pattern_list *pl = (struct pattern_list *)data;
+
+	if (!strcmp(var, SPARSE_CHECKOUT_DIR)) {
+		strbuf_addstr(&path, value);
+		strbuf_to_cone_pattern(&path, pl);
+		strbuf_release(&path);
+	}
+
+	return 0;
+}
+
+static int load_in_tree_from_blob(struct pattern_list *pl,
+				  struct object_id *oid)
+{
+	return git_config_from_blob_oid(sparse_dir_cb,
+					SPARSE_CHECKOUT_DIR,
+					oid, pl);
+}
+
+int load_in_tree_pattern_list(struct repository *r,
+			      struct string_list *sl,
+			      struct pattern_list *pl)
+{
+	struct index_state *istate = r->index;
+	struct string_list_item *item;
+	struct strbuf path = STRBUF_INIT;
+
+	pl->use_cone_patterns = 1;
+
+	for_each_string_list_item(item, sl) {
+		struct object_id *oid;
+		enum object_type type;
+		int pos = index_name_pos(istate, item->string, strlen(item->string));
+
+		/*
+		 * Exit silently, as this is likely the case where Git
+		 * changed branches to a location where the inherit file
+		 * does not exist. Do not update the sparse-checkout.
+		 */
+		if (pos < 0)
+			return 1;
+
+		oid = &istate->cache[pos]->oid;
+		type = oid_object_info(r, oid, NULL);
+
+		if (type != OBJ_BLOB) {
+			warning(_("expected a file at '%s'; not updating sparse-checkout"),
+				oid_to_hex(oid));
+			return 1;
+		}
+
+		load_in_tree_from_blob(pl, oid);
+	}
+
+	strbuf_release(&path);
+
+	return 0;
+}
+
 int populate_sparse_checkout_patterns(struct pattern_list *pl)
 {
 	int result;
-	char *sparse = get_sparse_checkout_filename();
-
-	pl->use_cone_patterns = core_sparse_checkout_cone;
-	result = add_patterns_from_file_to_list(sparse, "", 0, pl, NULL);
-	free(sparse);
+	const char *in_tree;
+
+	if (!git_config_get_value(SPARSE_CHECKOUT_IN_TREE, &in_tree) &&
+	    in_tree) {
+		struct string_list paths = STRING_LIST_INIT_DUP;
+		/* If we do not have this config, skip this step! */
+		if (load_in_tree_list_from_config(the_repository, &paths) ||
+		    !paths.nr)
+			return 1;
+
+		/* Check diff for paths over from/to. If any changed, reload. */
+		/* or for now, reload always! */
+		hashmap_init(&pl->recursive_hashmap, pl_hashmap_cmp, NULL, 0);
+		hashmap_init(&pl->parent_hashmap, pl_hashmap_cmp, NULL, 0);
+		pl->use_cone_patterns = 1;
+
+		result = load_in_tree_pattern_list(the_repository, &paths, pl);
+		string_list_clear(&paths, 0);
+	} else {
+		char *sparse = get_sparse_checkout_filename();
+
+		pl->use_cone_patterns = core_sparse_checkout_cone;
+		result = add_patterns_from_file_to_list(sparse, "", 0, pl, NULL);
+		free(sparse);
+	}
 
 	return result;
 }
@@ -243,3 +343,21 @@ void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 
 	insert_recursive_pattern(pl, line);
 }
+
+int set_sparse_in_tree_config(struct repository *r, struct string_list *sl)
+{
+	struct string_list_item *item;
+	const char *config_path = git_path("config.worktree");
+
+	/* clear existing values */
+	git_config_set_multivar_in_file_gently(config_path,
+					       SPARSE_CHECKOUT_IN_TREE,
+					       NULL, NULL, 1);
+
+	for_each_string_list_item(item, sl)
+		git_config_set_multivar_in_file_gently(
+			config_path, SPARSE_CHECKOUT_IN_TREE,
+			item->string, CONFIG_REGEX_NONE, 0);
+
+	return 0;
+}
diff --git a/sparse-checkout.h b/sparse-checkout.h
index e0c840f07f9..993a5701a60 100644
--- a/sparse-checkout.h
+++ b/sparse-checkout.h
@@ -4,14 +4,25 @@
 #include "cache.h"
 #include "repository.h"
 
+#define SPARSE_CHECKOUT_DIR "sparse.dir"
+#define SPARSE_CHECKOUT_IN_TREE "sparse.intree"
+
 struct pattern_list;
 
 char *get_sparse_checkout_filename(void);
 int populate_sparse_checkout_patterns(struct pattern_list *pl);
 void write_patterns_to_file(FILE *fp, struct pattern_list *pl);
 int update_working_directory(struct pattern_list *pl);
+int write_patterns(struct pattern_list *pl, int and_update);
 int write_patterns_and_update(struct pattern_list *pl);
 void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path);
 void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl);
 
+int load_in_tree_list_from_config(struct repository *r,
+				  struct string_list *sl);
+int load_in_tree_pattern_list(struct repository *r,
+			      struct string_list *sl,
+			      struct pattern_list *pl);
+int set_sparse_in_tree_config(struct repository *r, struct string_list *sl);
+
 #endif
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 88cdde255cd..1040bf9c261 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -604,4 +604,105 @@ test_expect_success MINGW 'cone mode replaces backslashes with slashes' '
 	check_files repo/deep a deeper1
 '
 
+test_expect_success 'basis of --in-tree' '
+	git -C repo config auto.crlf false &&
+	cat >folder1 <<-\EOF &&
+	[sparse]
+		dir = folder1
+	EOF
+	cat >folder2 <<-\EOF &&
+	[sparse]
+		dir = folder2
+	EOF
+	cat >deep <<-\EOF &&
+	[sparse]
+		dir = deep
+	EOF
+	cat >deeper1 <<-\EOF &&
+	[sparse]
+		dir = deep/deeper1
+	EOF
+	cat >sparse <<-\EOF &&
+	[sparse]
+		dir = .sparse
+	EOF
+	mkdir repo/.sparse &&
+	for file in folder1 folder2 deep deeper1 sparse
+	do
+		cp $file repo/.sparse/ || return 1
+	done &&
+	git -C repo add .sparse &&
+	git -C repo commit -m "Add sparse specifications" &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/folder1 &&
+	check_files repo a folder1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/folder1 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a folder1 &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/folder2 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/folder2 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a folder2 &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/deeper1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/deeper1 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a deep &&
+	check_files repo/deep a deeper1 &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/deeper1 .sparse/deep .sparse/folder1 &&
+	check_files repo a deep folder1 &&
+	check_files repo/deep a deeper1 deeper2 &&
+	cat >expect-list <<-EOF &&
+	deep
+	folder1
+	EOF
+	git -C repo sparse-checkout list >actual-list &&
+	test_cmp expect-list actual-list &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/folder1 .sparse/deeper1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	cat >expect-config <<-\EOF &&
+	.sparse/deeper1
+	.sparse/folder1
+	EOF
+	test_cmp expect-config actual-config &&
+	check_files repo a deep folder1
+'
+
+test_expect_success '"add" with --in-tree' '
+	git -C repo sparse-checkout set --in-tree .sparse/folder1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/folder1 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a folder1 &&
+	git -C repo sparse-checkout add --in-tree .sparse/deeper1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	cat >expect-config <<-\EOF &&
+	.sparse/deeper1
+	.sparse/folder1
+	EOF
+	test_cmp expect-config actual-config &&
+	check_files repo a deep folder1
+'
+
+test_expect_success 'reapply after updating in-tree file' '
+	git -C repo sparse-checkout set --in-tree .sparse/sparse &&
+	check_files repo a &&
+	test_path_is_dir repo/.sparse &&
+	echo "\tdir = folder1" >>repo/.sparse/sparse &&
+	git -C repo commit -a -m "Update sparse file" &&
+	git -C repo sparse-checkout reapply &&
+	check_files repo a folder1 &&
+	test_path_is_dir repo/.sparse &&
+	git -C repo checkout HEAD~1 &&
+	git -C repo sparse-checkout reapply &&
+	check_files repo a &&
+	test_path_is_dir repo/.sparse
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/10] sparse-checkout: automatically update in-tree definition
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-20 16:28   ` Elijah Newren
  2020-05-07 13:17 ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A benefit of having sparse-checkouts defined in the tree data is that
the sparse-checkout definition can change over time. To take advantage
of that data, a user could run "git sparse-checkout reapply" whenever
they thought the sparse-checkout definition was out of date. That is
likely too manual of a process, since a user would probably only
discover this after a failure to accomplish their goal, such as a build
failure.

Prevent user frustration by automatically updating the sparse-checkout
definition after changing the index when an in-tree definition is
provided by config. This will happen during every index change,
including every step of a rebase, including conflict states.

Special care was needed around the --no-sparse-checkout option in "git
read-tree". This previously relied on changing the skip_sparse_checkout
option in "struct unpack_trees_options" to prevent applying the
skip-worktree bits. However, now that we make a second update to the
index focusing on the skip-worktree bits, this needs to be prevented in
another way. The simplest thing to do was disable the feature through
the core_apply_sparse_checkout global variable.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt |  7 ++++---
 builtin/commit.c                      |  4 +++-
 builtin/read-tree.c                   |  4 ++++
 read-cache.c                          |  8 +++++---
 sparse-checkout.c                     | 26 +++++++++++++++++++++++++-
 sparse-checkout.h                     |  1 +
 t/t1091-sparse-checkout-builtin.sh    | 19 +++++++++++++++----
 7 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index da9322c5e41..c1713ebb1d2 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -233,9 +233,10 @@ Use `git sparse-checkout set --in-tree <path>` to initialize the patterns
 to those included in the file at `<path>`. This will override any existing
 patterns you have in your sparse-checkout file.
 
-After switching between commits with different versions of this file, run
-`git sparse-checkout reapply` to adjust the sparse-checkout patterns to
-the new definition.
+As Git switches between commits, it will update the in-tree sparse-checkout
+definition according to the files available at the new commit. If any of
+the specified files do not exist at the new commit, then the sparse-checkout
+definition will not change.
 
 
 SUBMODULES
diff --git a/builtin/commit.c b/builtin/commit.c
index 7ba33a3bec4..0eab8d74469 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -36,6 +36,7 @@
 #include "help.h"
 #include "commit-reach.h"
 #include "commit-graph.h"
+#include "sparse-checkout.h"
 
 static const char * const builtin_commit_usage[] = {
 	N_("git commit [<options>] [--] <pathspec>..."),
@@ -222,7 +223,8 @@ static int commit_index_files(void)
 	case COMMIT_AS_IS:
 		break; /* nothing to do */
 	case COMMIT_NORMAL:
-		err = commit_lock_file(&index_lock);
+		err = commit_lock_file(&index_lock) ||
+		      update_in_tree_sparse_checkout();
 		break;
 	case COMMIT_PARTIAL:
 		err = commit_lock_file(&index_lock);
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index af7424b94c8..9ae81ffffa1 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -247,6 +247,10 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 		parse_tree(tree);
 		init_tree_desc(t+i, tree->buffer, tree->size);
 	}
+
+	if (opts.skip_sparse_checkout)
+		core_apply_sparse_checkout = 0;
+
 	if (unpack_trees(nr_trees, t, &opts))
 		return 128;
 
diff --git a/read-cache.c b/read-cache.c
index aa427c5c170..150e73feb0d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -25,6 +25,7 @@
 #include "fsmonitor.h"
 #include "thread-utils.h"
 #include "progress.h"
+#include "sparse-checkout.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -3074,9 +3075,10 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 
 	if (ret)
 		return ret;
-	if (flags & COMMIT_LOCK)
-		ret = commit_locked_index(lock);
-	else
+	if (flags & COMMIT_LOCK) {
+		ret = commit_locked_index(lock) ||
+		      update_in_tree_sparse_checkout();
+	} else
 		ret = close_lock_file_gently(lock);
 
 	run_hook_le(NULL, "post-index-change",
diff --git a/sparse-checkout.c b/sparse-checkout.c
index d6c27ca19c4..6c58fda9722 100644
--- a/sparse-checkout.c
+++ b/sparse-checkout.c
@@ -92,9 +92,12 @@ int load_in_tree_pattern_list(struct repository *r,
 		 * Exit silently, as this is likely the case where Git
 		 * changed branches to a location where the inherit file
 		 * does not exist. Do not update the sparse-checkout.
+		 *
+		 * Use -1 return to ensure populate_from_existing_patterns()
+		 * skips the sparse-checkout updates.
 		 */
 		if (pos < 0)
-			return 1;
+			return -1;
 
 		oid = &istate->cache[pos]->oid;
 		type = oid_object_info(r, oid, NULL);
@@ -145,6 +148,7 @@ int populate_sparse_checkout_patterns(struct pattern_list *pl)
 	return result;
 }
 
+static int updating_sparse_checkout = 0;
 int update_working_directory(struct pattern_list *pl)
 {
 	enum update_sparsity_result result;
@@ -152,6 +156,10 @@ int update_working_directory(struct pattern_list *pl)
 	struct lock_file lock_file = LOCK_INIT;
 	struct repository *r = the_repository;
 
+	if (updating_sparse_checkout)
+		return 0;
+	updating_sparse_checkout = 1;
+
 	memset(&o, 0, sizeof(o));
 	o.verbose_update = isatty(2);
 	o.update = 1;
@@ -180,9 +188,24 @@ int update_working_directory(struct pattern_list *pl)
 	else
 		rollback_lock_file(&lock_file);
 
+	updating_sparse_checkout = 0;
 	return result;
 }
 
+int update_in_tree_sparse_checkout(void)
+{
+	const char *first_value;
+
+	if (!core_apply_sparse_checkout)
+		return 0;
+
+	/* only update if doing so due to sparse.inTree. */
+	if (!git_config_get_value(SPARSE_CHECKOUT_IN_TREE, &first_value) &&
+	    first_value)
+	    	return update_working_directory(NULL);
+	return 0;
+}
+
 static char *escaped_pattern(char *pattern)
 {
 	char *p = pattern;
@@ -273,6 +296,7 @@ int write_patterns_and_update(struct pattern_list *pl)
 		free(sparse_filename);
 		clear_pattern_list(pl);
 		update_working_directory(NULL);
+		updating_sparse_checkout = 0;
 		return result;
 	}
 
diff --git a/sparse-checkout.h b/sparse-checkout.h
index 993a5701a60..fb0ba48524a 100644
--- a/sparse-checkout.h
+++ b/sparse-checkout.h
@@ -13,6 +13,7 @@ char *get_sparse_checkout_filename(void);
 int populate_sparse_checkout_patterns(struct pattern_list *pl);
 void write_patterns_to_file(FILE *fp, struct pattern_list *pl);
 int update_working_directory(struct pattern_list *pl);
+int update_in_tree_sparse_checkout(void);
 int write_patterns(struct pattern_list *pl, int and_update);
 int write_patterns_and_update(struct pattern_list *pl);
 void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 1040bf9c261..fdaafba5377 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -605,6 +605,7 @@ test_expect_success MINGW 'cone mode replaces backslashes with slashes' '
 '
 
 test_expect_success 'basis of --in-tree' '
+	git -C repo branch no-in-tree &&
 	git -C repo config auto.crlf false &&
 	cat >folder1 <<-\EOF &&
 	[sparse]
@@ -690,18 +691,28 @@ test_expect_success '"add" with --in-tree' '
 	check_files repo a deep folder1
 '
 
-test_expect_success 'reapply after updating in-tree file' '
+test_expect_success 'automatically change after updating in-tree file' '
 	git -C repo sparse-checkout set --in-tree .sparse/sparse &&
 	check_files repo a &&
 	test_path_is_dir repo/.sparse &&
-	echo "\tdir = folder1" >>repo/.sparse/sparse &&
+	printf "\tdir = folder1\n" >>repo/.sparse/sparse &&
 	git -C repo commit -a -m "Update sparse file" &&
-	git -C repo sparse-checkout reapply &&
 	check_files repo a folder1 &&
 	test_path_is_dir repo/.sparse &&
 	git -C repo checkout HEAD~1 &&
-	git -C repo sparse-checkout reapply &&
 	check_files repo a &&
+	test_path_is_dir repo/.sparse &&
+	git -C repo checkout - &&
+	check_files repo a folder1 &&
+	test_path_is_dir repo/.sparse
+'
+
+test_expect_success 'keep definition when in-tree file is missing' '
+	git -C repo checkout no-in-tree &&
+	check_files repo a folder1 &&
+	test_path_is_missing repo/.sparse &&
+	git -C repo checkout - &&
+	check_files repo a folder1 &&
 	test_path_is_dir repo/.sparse
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-20 16:40   ` Elijah Newren
  2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we parse the in-tree config files that store the sparse.dir values
used to create an in-tree sparse-checkout definition, we can easily
avoid parsing the same file multiple times by using an oidset on those
blobs. We only parse if the oid is new to the oidset.

This is unlikely to have a major performance benefit right now, but will
be extremely important when we introduce the sparse.inherit options to
link multiple files in a directed graph. This oidset will prevent
infinite loops when cycles exist in that digraph, or exponential blowups
even in the case of a directed acyclic graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-checkout.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/sparse-checkout.c b/sparse-checkout.c
index 6c58fda9722..d01f4d7b525 100644
--- a/sparse-checkout.c
+++ b/sparse-checkout.c
@@ -9,6 +9,7 @@
 #include "string-list.h"
 #include "unpack-trees.h"
 #include "object-store.h"
+#include "oidset.h"
 
 char *get_sparse_checkout_filename(void)
 {
@@ -77,9 +78,12 @@ int load_in_tree_pattern_list(struct repository *r,
 			      struct string_list *sl,
 			      struct pattern_list *pl)
 {
+	int result = 0;
 	struct index_state *istate = r->index;
 	struct string_list_item *item;
 	struct strbuf path = STRBUF_INIT;
+	struct oidset set;
+	oidset_init(&set, 16);
 
 	pl->use_cone_patterns = 1;
 
@@ -96,24 +100,34 @@ int load_in_tree_pattern_list(struct repository *r,
 		 * Use -1 return to ensure populate_from_existing_patterns()
 		 * skips the sparse-checkout updates.
 		 */
-		if (pos < 0)
-			return -1;
+		if (pos < 0) {
+			result = -1;
+			goto cleanup;
+		}
 
 		oid = &istate->cache[pos]->oid;
+
+		if (oidset_contains(&set, oid))
+			continue;
+
+		oidset_insert(&set, oid);
+
 		type = oid_object_info(r, oid, NULL);
 
 		if (type != OBJ_BLOB) {
 			warning(_("expected a file at '%s'; not updating sparse-checkout"),
 				oid_to_hex(oid));
-			return 1;
+			result = 1;
+			goto cleanup;
 		}
 
 		load_in_tree_from_blob(pl, oid);
 	}
 
+cleanup:
 	strbuf_release(&path);
-
-	return 0;
+	oidset_clear(&set);
+	return result;
 }
 
 int populate_sparse_checkout_patterns(struct pattern_list *pl)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/10] sparse-checkout: define in-tree dependencies
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-20 18:10   ` Elijah Newren
  2020-05-07 13:17 ` [PATCH 08/10] Makefile: skip git-gui if dir is missing Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As mentioned in the definition of the in-tree sparse-checkout
definitions, a large repository could have many different roles for the
users of that repository. That means that many different in-tree
sparse-checkout definitions need to exist. If a directory needs to be
added to many of these roles, then many of these files need to be
edited. This is too much work to scale properly.

Instead, let's make these definitions easier to maintain by following
basic principles of build dependencies. Add simple links between these
in-tree files using the new "sparse.inherit" config key. This multi-
valued config setting specifies more files that are needed to define the
sparse-checkout for a role. By separating out the common directories
into a base file, the more specialized roles can focus their files on
the directories only needed by those roles and leave the common
directories to that base file.

For example, suppose that .sparse/base has the following data:

	[sparse]
		dir = A
		dir = B/C
		dir = D/E/F

and .sparse/extra has the following data:

	[sparse]
		dir = D
		dir = X
		inherit = .sparse/base

Now, a user can run "git sparse-checkout set --in-tree .sparse/extra"
and the resulting directories will define their cone-mode
sparse-checkout patterns:

	A
	B/C
	D
	X

Thus, the resulting sparse-checkout definition is the union of the
definitions from .sparse/base and .sparse/extra, but the user only has
one value of sparse.inTree of ".sparse/extra".

It is simple to modify our existing logic to explore the directed graph
created by the sparse.inherit values. We simply need to append to the
string_list containing the list of in-tree files. Since we never repeat
parsing on the same blob oid, this will not lead to infinite loops or
exponential blowups in the parsing time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt | 26 ++++++++++++++++++++++
 sparse-checkout.c                     | 27 ++++++++++++++++++-----
 sparse-checkout.h                     |  1 +
 t/t1091-sparse-checkout-builtin.sh    | 31 +++++++++++++++++++++++++++
 4 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index c1713ebb1d2..941658e0011 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -238,6 +238,32 @@ definition according to the files available at the new commit. If any of
 the specified files do not exist at the new commit, then the sparse-checkout
 definition will not change.
 
+In a very large repository, there may be need to have multiple of these
+in-tree sparse-checkout definitions to fit the roles of multiple types of
+users. As each definition grows and the number of user types grow, it can
+be difficult to manage updating all of the definitions when a common core
+is modified somehow. For this reason, the in-tree pattern sets can inherit
+the directories from other in-tree pattern sets. Use the `sparse.inherit`
+option to specify other files in the tree.
+
+For example, suppose the file listed earlier is at `.sparse/core`. Another
+file could be stored as `.sparse/extra` with contents
+
+----------------------------------
+[sparse]
+	dir = X
+	dir = Y/Z
+	inherit = .sparse/core
+----------------------------------
+
+Then, if you run `git sparse-checkout set --in-tree .sparse/extra`, the
+sparse-checkout definition will include `X` and `Y/Z` from `.sparse/extra`
+as well as `A`, `B/C`, and `D/E/F` from `.sparse/core`. This is similar
+to specifying both `.sparse/core` and `.sparse/extra` in the `set`
+subcommand, but has a slight advantage. If the `.sparse/extra` file changes
+the set of `inherit` files, then your sparse-checkout definition will
+update accordingly as you switch between commits.
+
 
 SUBMODULES
 ----------
diff --git a/sparse-checkout.c b/sparse-checkout.c
index d01f4d7b525..4edeab49a10 100644
--- a/sparse-checkout.c
+++ b/sparse-checkout.c
@@ -66,12 +66,29 @@ static int sparse_dir_cb(const char *var, const char *value, void *data)
 	return 0;
 }
 
+static int sparse_inherit_cb(const char *var, const char *value, void *data)
+{
+	struct string_list *sl = (struct string_list *)data;
+
+	if (!strcmp(var, SPARSE_CHECKOUT_INHERIT))
+		string_list_append(sl, value);
+
+	return 0;
+}
+
 static int load_in_tree_from_blob(struct pattern_list *pl,
-				  struct object_id *oid)
+				  struct object_id *oid,
+				  struct string_list *inherit)
 {
-	return git_config_from_blob_oid(sparse_dir_cb,
-					SPARSE_CHECKOUT_DIR,
-					oid, pl);
+	if (git_config_from_blob_oid(sparse_dir_cb,
+				     SPARSE_CHECKOUT_DIR,
+				     oid, pl))
+		return 1;
+	if (git_config_from_blob_oid(sparse_inherit_cb,
+				     SPARSE_CHECKOUT_INHERIT,
+				     oid, inherit))
+		return 1;
+	return 0;
 }
 
 int load_in_tree_pattern_list(struct repository *r,
@@ -121,7 +138,7 @@ int load_in_tree_pattern_list(struct repository *r,
 			goto cleanup;
 		}
 
-		load_in_tree_from_blob(pl, oid);
+		load_in_tree_from_blob(pl, oid, sl);
 	}
 
 cleanup:
diff --git a/sparse-checkout.h b/sparse-checkout.h
index fb0ba48524a..8b766ea38fb 100644
--- a/sparse-checkout.h
+++ b/sparse-checkout.h
@@ -5,6 +5,7 @@
 #include "repository.h"
 
 #define SPARSE_CHECKOUT_DIR "sparse.dir"
+#define SPARSE_CHECKOUT_INHERIT "sparse.inherit"
 #define SPARSE_CHECKOUT_IN_TREE "sparse.intree"
 
 struct pattern_list;
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index fdaafba5377..b2389e5b5e6 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -716,4 +716,35 @@ test_expect_success 'keep definition when in-tree file is missing' '
 	test_path_is_dir repo/.sparse
 '
 
+test_expect_success 'inherit definition from other files' '
+	cat >repo/.sparse/inherit <<-EOF &&
+	[sparse]
+		inherit = .sparse/sparse
+		inherit = .sparse/deep
+		inherit = .sparse/deeper1
+	EOF
+	git -C repo add .sparse &&
+	git -C repo commit -m "Add inherited file" &&
+	git -C repo sparse-checkout set --in-tree .sparse/inherit &&
+	check_files repo a deep folder1 &&
+	check_files repo/deep a deeper1 deeper2 &&
+	test_path_is_dir repo/.sparse &&
+	cat >repo/.sparse/sparse <<-EOF &&
+	[sparse]
+		dir = .sparse
+	EOF
+	git -C repo commit -a -m "drop folder1 from sparse" &&
+	check_files repo a deep &&
+	check_files repo/deep a deeper1 deeper2 &&
+	test_path_is_dir repo/.sparse
+'
+
+test_expect_success 'inherit files can have cycles' '
+	echo "\tinherit = .sparse/inherit" >>repo/.sparse/sparse &&
+	git -C repo commit -a -m "create inherit cycle" &&
+	check_files repo a deep &&
+	check_files repo/deep a deeper1 deeper2 &&
+	test_path_is_dir repo/.sparse
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/10] Makefile: skip git-gui if dir is missing
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 09/10] Makefile: disable GETTEXT when 'po' " Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As an effort to promote "dogfooding" the sparse-checkout feature within
the Git codebase, it is helpful to explore which portions of the
codebase are optional.

The NO_TCLTK variable in the Makefile allows ignoring the git-gui
application at build time. If the local sparse-checkout removes that
directory, then the build will fail without manually defining that
variable. Instead, check to see if the directory exists at build time to
see if we can automatically ignore this directory.

With this change, the following list of directories can be supplied to
"git sparse-checkout set" and Git will build and test on Linux:

	Documentation
	builtin
	compat/.depend
	contrib
	ewah
	mergetools
	negotiator
	perl
	refs
	sha1dc
	sha256
	t
	templates
	trace2
	vcs-svn
	xdiff

The "make install" command requires the "gitweb" and "po" directories.
The "po" directory will become optional in the next change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 2e8029b8026..70760d315cb 100644
--- a/Makefile
+++ b/Makefile
@@ -313,7 +313,8 @@ all::
 #
 # Define NO_PYTHON if you do not want Python scripts or libraries at all.
 #
-# Define NO_TCLTK if you do not want Tcl/Tk GUI.
+# Define NO_TCLTK if you do not want Tcl/Tk GUI. This will also be defined
+# if the git-gui directory is missing.
 #
 # Define SANE_TEXT_GREP to "-a" if you use recent versions of GNU grep
 # and egrep that are pickier when their input contains non-ASCII data.
@@ -1860,6 +1861,12 @@ ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
 
+ifndef NO_TCLTK
+	ifeq ($(ls git-gui),)
+		NO_TCLTK = NoThanks
+	endif
+endif
+
 ifeq ($(PERL_PATH),)
 NO_PERL = NoThanks
 endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/10] Makefile: disable GETTEXT when 'po' is missing
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 08/10] Makefile: skip git-gui if dir is missing Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-07 13:17 ` [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As an effort to promote "dogfooding" the sparse-checkout feature within
the Git codebase, it is helpful to explore which portions of the
codebase are optional.

The NO_GETTEXT build variable can disable the translation libraries.
The test suite then uses the NO_GETTEXT environment variable to disable
the GETTEXT prerequisite for some tests.

The 'po' directory contains translations for strings in the Git
codebase. While this stores extremely important data for Git users, the
data is not typically used by Git contributors in their daily work.
Thus, it could be removed from the working directory using
sparse-checkout.

However, doing such a removal causes some tests to fail. Part of the
failures are related to the GETTEXT prerequisite being enabled when it
probably should not be. The other part is that some tests in
t0200-gettext-basic.sh depend on the existence of files in the po
directory.

In test-lib, disable the GETTEXT prerequisite when the po directory does
not exist, then add the GETTEXT prerequisite to these tests that use
files in that directory.

In Makefile, define NO_GETTEXT when the po directory does not exist.
This is necessary for "make install" to work correctly.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile                 | 9 ++++++++-
 t/lib-gettext.sh         | 1 -
 t/t0200-gettext-basic.sh | 6 +++---
 t/test-lib.sh            | 7 ++++++-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 70760d315cb..38c2d54349a 100644
--- a/Makefile
+++ b/Makefile
@@ -72,7 +72,8 @@ all::
 #
 # Define NO_GETTEXT if you don't want Git output to be translated.
 # A translated Git requires GNU libintl or another gettext implementation,
-# plus libintl-perl at runtime.
+# plus libintl-perl at runtime. This will also be defined if the 'po'
+# directory is missing.
 #
 # Define USE_GETTEXT_SCHEME and set it to 'fallthrough', if you don't trust
 # the installed gettext translation of the shell scripts output.
@@ -1861,6 +1862,12 @@ ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
 
+ifndef NO_GETTEXT
+	ifeq ($(ls po),)
+		NO_GETTEXT = NoThanks
+	endif
+endif
+
 ifndef NO_TCLTK
 	ifeq ($(ls git-gui),)
 		NO_TCLTK = NoThanks
diff --git a/t/lib-gettext.sh b/t/lib-gettext.sh
index 2139b427ca1..beeb45a1387 100644
--- a/t/lib-gettext.sh
+++ b/t/lib-gettext.sh
@@ -7,7 +7,6 @@
 . ./test-lib.sh
 
 GIT_TEXTDOMAINDIR="$GIT_BUILD_DIR/po/build/locale"
-GIT_PO_PATH="$GIT_BUILD_DIR/po"
 export GIT_TEXTDOMAINDIR GIT_PO_PATH
 
 if test -n "$GIT_TEST_INSTALLED"
diff --git a/t/t0200-gettext-basic.sh b/t/t0200-gettext-basic.sh
index 8853d8afb92..0f6bc941cbb 100755
--- a/t/t0200-gettext-basic.sh
+++ b/t/t0200-gettext-basic.sh
@@ -15,17 +15,17 @@ test_expect_success 'sanity: $TEXTDOMAIN is git' '
     test $TEXTDOMAIN = "git"
 '
 
-test_expect_success 'xgettext sanity: Perl _() strings are not extracted' '
+test_expect_success GETTEXT 'xgettext sanity: Perl _() strings are not extracted' '
     ! grep "A Perl string xgettext will not get" "$GIT_PO_PATH"/is.po
 '
 
-test_expect_success 'xgettext sanity: Comment extraction with --add-comments' '
+test_expect_success GETTEXT 'xgettext sanity: Comment extraction with --add-comments' '
     grep "TRANSLATORS: This is a test" "$TEST_DIRECTORY"/t0200/* | wc -l >expect &&
     grep "TRANSLATORS: This is a test" "$GIT_PO_PATH"/is.po  | wc -l >actual &&
     test_cmp expect actual
 '
 
-test_expect_success 'xgettext sanity: Comment extraction with --add-comments stops at statements' '
+test_expect_success  GETTEXT 'xgettext sanity: Comment extraction with --add-comments stops at statements' '
     ! grep "This is a phony" "$GIT_PO_PATH"/is.po &&
     ! grep "the above comment" "$GIT_PO_PATH"/is.po
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0ea1e5a05ed..ca22d23f0d2 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1474,7 +1474,12 @@ test -z "$NO_PYTHON" && test_set_prereq PYTHON
 test -n "$USE_LIBPCRE1$USE_LIBPCRE2" && test_set_prereq PCRE
 test -n "$USE_LIBPCRE1" && test_set_prereq LIBPCRE1
 test -n "$USE_LIBPCRE2" && test_set_prereq LIBPCRE2
-test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
+
+GIT_PO_PATH="$GIT_BUILD_DIR/po"
+if test -d "$GIT_PO_PATH"
+then
+	test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
+fi
 
 if test -n "$GIT_TEST_GETTEXT_POISON_ORIG"
 then
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 09/10] Makefile: disable GETTEXT when 'po' " Derrick Stolee via GitGitGadget
@ 2020-05-07 13:17 ` Derrick Stolee via GitGitGadget
  2020-05-20 17:38 ` [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Elijah Newren
  2020-06-17 23:14 ` Elijah Newren
  11 siblings, 0 replies; 32+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-07 13:17 UTC (permalink / raw)
  To: git; +Cc: newren, peff, me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The in-tree sparse-checkout feature allows architects to adapt their
dependentent subsystems into sparse-checkout specificiations. This helps
the typical users who do not know the full build system to use the
sparse-chekcout in a pain-free way. In particular, if the
sparse-checkout dependencies update at the same times as build
dependencies, then the users automatically get new sparse-checkout
definitions as they switch branches.

For the Git codebase, it is not immediately obvious which directories
are absolutely required for building Git. From my estimation, the
necessary directories for building and testing Git on Linux are listed
in the .sparse/base.deps file in this change. A few more directories in
the compat/ dir are required for building on Windows.

This presents a new possible workflow for Git contributors, especially
those that want to test several new features in their workflow. The
following allows a user to set up working on Git with partial clone and
sparse-checkout:

 $ git clone --sparse --filter=blob:none https://github.com/git/git
 $ cd git
 $ git sparse-checkout set --in-tree .sparse/base.deps
 $ make test

Perhaps there are ways we can further reduce the size of the "bare
necessities" by rearranging code and adjusting the Makefile to match. In
particular, I noticed that the vcs-svn directory is required for the
build. It could be helpful to create a way to build a very small version
of "core Git" that doesn't include integrations with other version
control systems, and having them automatically disabled if they are
missing from the sparse-checkout definition would be a great way to make
that more accessible for contributors.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .sparse/base.deps    | 19 +++++++++++++++++++
 .sparse/windows.deps |  3 +++
 2 files changed, 22 insertions(+)
 create mode 100644 .sparse/base.deps
 create mode 100644 .sparse/windows.deps

diff --git a/.sparse/base.deps b/.sparse/base.deps
new file mode 100644
index 00000000000..b0682175dc3
--- /dev/null
+++ b/.sparse/base.deps
@@ -0,0 +1,19 @@
+[sparse]
+	dir = Documentation
+	dir = block-sha1
+	dir = builtin
+	dir = compat/.depend
+	dir = contrib
+	dir = ewah
+	dir = gitweb
+	dir = mergetools
+	dir = negotiator
+	dir = perl
+	dir = refs
+	dir = sha1dc
+	dir = sha256
+	dir = t
+	dir = templates
+	dir = trace2
+	dir = vcs-svn
+	dir = xdiff
diff --git a/.sparse/windows.deps b/.sparse/windows.deps
new file mode 100644
index 00000000000..6c9bf1df335
--- /dev/null
+++ b/.sparse/windows.deps
@@ -0,0 +1,3 @@
+[sparse]
+	inherit = .sparse/base.deps
+	dir = compat
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] unpack-trees: avoid array out-of-bounds error
  2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
@ 2020-05-07 22:27   ` Junio C Hamano
  2020-05-08 12:19     ` Derrick Stolee
  2020-05-20 16:32     ` Elijah Newren
  0 siblings, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2020-05-07 22:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, peff, me, jrnieder, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The loop in warn_conflicted_path() that checks for the count of entries
> with the same path uses "i+count" for the array entry. However, the loop
> only verifies that the value of count is below the array size. Fix this
> by adding i to the condition.
>
> I hit this condition during a test of the in-tree sparse-checkout
> feature, so it is exercised by the end of the series.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  unpack-trees.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 9a3ccd9d083..4f880f2da90 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -563,10 +563,11 @@ static int warn_conflicted_path(struct index_state *istate,
>  	add_rejected_path(o, WARNING_SPARSE_UNMERGED_FILE, conflicting_path);
>  
>  	/* Find out how many higher stage entries at same path */
> -	while (++count < istate->cache_nr &&
> +	while (i + ++count < istate->cache_nr &&
>  	       !strcmp(conflicting_path,
>  		       istate->cache[i+count]->name))
>  		/* do nothing */;

Eek.  Yes, it is obvious that the original is wrong once you point
it out.  But "i + ++count" looks like a line noise, and funny way
that lines are wrapped in the original does not help X-<.

We may want to fix the style and the grammar while we are at it,
perhaps like the attached.

In any case, thanks for a fix.

 unpack-trees.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 6bbf58d28e..c38938d96c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -562,11 +562,11 @@ static int warn_conflicted_path(struct index_state *istate,
 
 	add_rejected_path(o, WARNING_SPARSE_UNMERGED_FILE, conflicting_path);
 
-	/* Find out how many higher stage entries at same path */
-	while (++count < istate->cache_nr &&
-	       !strcmp(conflicting_path,
-		       istate->cache[i+count]->name))
-		/* do nothing */;
+	/* Find out how many higher stage entries are at same path */
+	while ((++count) + i < istate->cache_nr &&
+	       !strcmp(conflicting_path, istate->cache[count + i]->name))
+		; /* do nothing */
+
 	return count;
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions
  2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
@ 2020-05-07 22:58   ` Junio C Hamano
  2020-05-08 15:40     ` Derrick Stolee
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2020-05-07 22:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, peff, me, jrnieder, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> One of the difficulties of using the sparse-checkout feature is not
> knowing which directories are absolutely needed for working in a portion
> of the repository. Some of this can be documented in README files or
> included in a bootstrapping tool along with the repository. This is done
> in an ad-hoc way by every project that wants to use it.
>
> Let's make this process easier for users by creating a way to define a
> useful sparse-checkout definition inside the Git tree data. This has
> several benefits. In particular, the data is available to anyone who has
> a copy of the repository without needing a different data source.
> Second, the needs of the repository can change over time and Git can
> present a way to automatically update the working directory as these
> sparse-checkout definitions change over time.

And two lines of development can merge them together?

Any time a new "feature" pops up that would eventually affect how
"git clone" and "git checkout" work based on untrusted user data, we
need to make sure there is no negative security implications.  

If it only boils down to "we have files that can record list of
leading directory names and without offering extra 'flexibility'", I
guess there aren't all that much that a malicious sparse definition
can do and we would be safe, though.

> To use this feature, add the "--in-tree" option when setting or adding
> directories to the sparse-checkout definition. For example:
>
>   $ git sparse-checkout set --in-tree .sparse/base
>   $ git sparse-checkout add --in-tree .sparse/extra
>
> These commands add values to the multi-valued config setting
> "sparse.inTree". When updating the sparse-checkout definition, these
> values describe paths in the repository to find the sparse-checkout
> data. After the commands listed earlier, we expect to see the following
> in .git/config.worktree:
>
> 	[sparse]
> 		intree = .sparse/base
> 		intree = .sparse/extra

What does this say in human words?  "These two tracked files specify
which paths should be in the working tree"?  Spelling it out here
would help readers of this commit.

> When applying the sparse-checkout definitions from this config, the
> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. 

OK, so end-user edit to the working tree copy or what is added to
the index does not count and only the committed version gets used.

That makes it simple---I was wondering how we would operate when
merging a branch with different contents in the .sparse/* files
until the conflicts are resolved.

> In those
> files, the multi-valued config values "sparse.dir" are considered as
> the directories to construct a cone mode sparse-checkout file. The end
> result is as if these paths were provided to "git sparse-checkout set"
> in cone mode.

OK.

> For example, suppose .sparse/base had the following content:
>
> 	[sparse]
> 		dir = A
> 		dir = B/C
> 		dir = D/E/F
>
> and .sparse/extra had the following content:
>
> 	[sparse]
> 		dir = D
> 		dir = X
>
> Then, the output of "git sparse-checkout list" would be
>
> 	A
> 	B/C
> 	D
> 	X
>
> Note that since "D" contains "D/E/F", that directory replaces the
> position of "D/E/F" in the list.
>
> Since these are parsed using the config library, the parser is robust
> enough to understand comments and complicated string values.
>
> The key benefit to this approach is that it can be extended by defining
> new config values. In a later change, we will introduce "sparse.inherit"
> to point to another file in the tree. This will solve the problem of
> editing many files when core dependencies change.

With only a multi-valued sparse.dir elements allowed in these
in-tree .sparse/* files, I guess there isn't much mischeaf a
malicious .sparse/* file can do.  Can it try to [includeIf] some
paths external to the repository to cause a remote attacker to learn
about the paths on the local system, perhaps? 

> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> new file mode 100644
> index 00000000000..c1fce87cd33
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,15 @@
> +sparse.inTree::
> +	The `core.sparseCheckout` config option enables the `sparse-checkout`
> +	feature, but if there are any values for the multi-valued
> +	`sparse.inTree` config option, then the sparse-checkout patterns are
> +	defined by parsing the files listed in these values. See
> +	linkgit:git-sparse-checkout[1] for more information.

Does "... but if ... then" mean "by parsing the files and ignoring
all other things that may otherwise define patterns"?

> +sparse.dir::
> +	This config setting is ignored if present in the repository config.
> +	Instead, this multi-valued option is present in the files listed by
> +	`sparse.inTree` and specifies the directories needed in the
> +	working directory. The union of all `sparse.dir` values across all
> +	`sparse.inTree` files forms the input for `git sparse-checkout set`
> +	in cone mode.  See linkgit:git-sparse-checkout[1] for more
> +	information.

If this is *not* a config in the usual sense, we probably should not
include it in this document and in the "git config --help" output.
That will allow us to drop the first sentence.

Those .sparse/* in-tree files are like .gitmodules in the sense that
they happen to use the same syntax so that the parser can be shared,
but they are not allowed to affect end-user configuration
(e.g. writing "[diff] external=rm -fr ." in the file has no effect)
at all, right?

And we should describe what we can write in these in-tree files
separately and make it clear that they are _different_ from the
configuration variables.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] unpack-trees: avoid array out-of-bounds error
  2020-05-07 22:27   ` Junio C Hamano
@ 2020-05-08 12:19     ` Derrick Stolee
  2020-05-08 15:09       ` Junio C Hamano
  2020-05-20 16:32     ` Elijah Newren
  1 sibling, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2020-05-08 12:19 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, peff, me, jrnieder, Derrick Stolee

On 5/7/2020 6:27 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 9a3ccd9d083..4f880f2da90 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -563,10 +563,11 @@ static int warn_conflicted_path(struct index_state *istate,
>>  	add_rejected_path(o, WARNING_SPARSE_UNMERGED_FILE, conflicting_path);
>>  
>>  	/* Find out how many higher stage entries at same path */
>> -	while (++count < istate->cache_nr &&
>> +	while (i + ++count < istate->cache_nr &&
>>  	       !strcmp(conflicting_path,
>>  		       istate->cache[i+count]->name))
>>  		/* do nothing */;
> 
> Eek.  Yes, it is obvious that the original is wrong once you point
> it out.  But "i + ++count" looks like a line noise, and funny way
> that lines are wrapped in the original does not help X-<.
> 
> We may want to fix the style and the grammar while we are at it,
> perhaps like the attached.
> 
> In any case, thanks for a fix.
> 
>  unpack-trees.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 6bbf58d28e..c38938d96c 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -562,11 +562,11 @@ static int warn_conflicted_path(struct index_state *istate,
>  
>  	add_rejected_path(o, WARNING_SPARSE_UNMERGED_FILE, conflicting_path);
>  
> -	/* Find out how many higher stage entries at same path */
> -	while (++count < istate->cache_nr &&
> -	       !strcmp(conflicting_path,
> -		       istate->cache[i+count]->name))
> -		/* do nothing */;
> +	/* Find out how many higher stage entries are at same path */
> +	while ((++count) + i < istate->cache_nr &&
> +	       !strcmp(conflicting_path, istate->cache[count + i]->name))
> +		; /* do nothing */
> +

This looks much better, thanks!

As I mentioned in the cover letter, this is worth taking on its own. Could
you queue the collaborative patch? I'll eject it from the next version of
this series. 

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] unpack-trees: avoid array out-of-bounds error
  2020-05-08 12:19     ` Derrick Stolee
@ 2020-05-08 15:09       ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2020-05-08 15:09 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, newren, peff, me, jrnieder,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> As I mentioned in the cover letter, this is worth taking on its own. Could
> you queue the collaborative patch? I'll eject it from the next version of
> this series. 

Yes, I think this is worth taking independently.

I do not think people would find the updated version "much better",
but it made it readable at least to me.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions
  2020-05-07 22:58   ` Junio C Hamano
@ 2020-05-08 15:40     ` Derrick Stolee
  2020-05-20 17:52       ` Elijah Newren
  0 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2020-05-08 15:40 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, peff, me, jrnieder, Derrick Stolee

On 5/7/2020 6:58 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> One of the difficulties of using the sparse-checkout feature is not
>> knowing which directories are absolutely needed for working in a portion
>> of the repository. Some of this can be documented in README files or
>> included in a bootstrapping tool along with the repository. This is done
>> in an ad-hoc way by every project that wants to use it.
>>
>> Let's make this process easier for users by creating a way to define a
>> useful sparse-checkout definition inside the Git tree data. This has
>> several benefits. In particular, the data is available to anyone who has
>> a copy of the repository without needing a different data source.
>> Second, the needs of the repository can change over time and Git can
>> present a way to automatically update the working directory as these
>> sparse-checkout definitions change over time.
> 
> And two lines of development can merge them together?
> 
> Any time a new "feature" pops up that would eventually affect how
> "git clone" and "git checkout" work based on untrusted user data, we
> need to make sure there is no negative security implications.  
> 
> If it only boils down to "we have files that can record list of
> leading directory names and without offering extra 'flexibility'", I
> guess there aren't all that much that a malicious sparse definition
> can do and we would be safe, though.

Yes. I hope that we can be extremely careful with this feature.
The RFC status of this series implicitly includes the question
"Should we do this at all?" I think the benefits outweigh the
risks, but we can minimize those risks with very careful design
and implementation.

>> To use this feature, add the "--in-tree" option when setting or adding
>> directories to the sparse-checkout definition. For example:
>>
>>   $ git sparse-checkout set --in-tree .sparse/base
>>   $ git sparse-checkout add --in-tree .sparse/extra
>>
>> These commands add values to the multi-valued config setting
>> "sparse.inTree". When updating the sparse-checkout definition, these
>> values describe paths in the repository to find the sparse-checkout
>> data. After the commands listed earlier, we expect to see the following
>> in .git/config.worktree:
>>
>> 	[sparse]
>> 		intree = .sparse/base
>> 		intree = .sparse/extra
> 
> What does this say in human words?  "These two tracked files specify
> which paths should be in the working tree"?  Spelling it out here
> would help readers of this commit.

You got it. Sounds good.

>> When applying the sparse-checkout definitions from this config, the
>> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. 
> 
> OK, so end-user edit to the working tree copy or what is added to
> the index does not count and only the committed version gets used.
> 
> That makes it simple---I was wondering how we would operate when
> merging a branch with different contents in the .sparse/* files
> until the conflicts are resolved.

It's worth testing this case so we can be sure what happens.

>> In those
>> files, the multi-valued config values "sparse.dir" are considered as
>> the directories to construct a cone mode sparse-checkout file. The end
>> result is as if these paths were provided to "git sparse-checkout set"
>> in cone mode.
> 
> OK.
> 
>> For example, suppose .sparse/base had the following content:
>>
>> 	[sparse]
>> 		dir = A
>> 		dir = B/C
>> 		dir = D/E/F
>>
>> and .sparse/extra had the following content:
>>
>> 	[sparse]
>> 		dir = D
>> 		dir = X
>>
>> Then, the output of "git sparse-checkout list" would be
>>
>> 	A
>> 	B/C
>> 	D
>> 	X
>>
>> Note that since "D" contains "D/E/F", that directory replaces the
>> position of "D/E/F" in the list.
>>
>> Since these are parsed using the config library, the parser is robust
>> enough to understand comments and complicated string values.
>>
>> The key benefit to this approach is that it can be extended by defining
>> new config values. In a later change, we will introduce "sparse.inherit"
>> to point to another file in the tree. This will solve the problem of
>> editing many files when core dependencies change.
> 
> With only a multi-valued sparse.dir elements allowed in these
> in-tree .sparse/* files, I guess there isn't much mischeaf a
> malicious .sparse/* file can do.  Can it try to [includeIf] some
> paths external to the repository to cause a remote attacker to learn
> about the paths on the local system, perhaps? 

I was unaware of includes in the config format [1]. While the behavior
should be safe because we are only pulling very specific data from the
config, it would be best to see if we can disable includes when reading
config from a blob.

[1] https://git-scm.com/docs/git-config#_includes

> 
>> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
>> new file mode 100644
>> index 00000000000..c1fce87cd33
>> --- /dev/null
>> +++ b/Documentation/config/sparse.txt
>> @@ -0,0 +1,15 @@
>> +sparse.inTree::
>> +	The `core.sparseCheckout` config option enables the `sparse-checkout`
>> +	feature, but if there are any values for the multi-valued
>> +	`sparse.inTree` config option, then the sparse-checkout patterns are
>> +	defined by parsing the files listed in these values. See
>> +	linkgit:git-sparse-checkout[1] for more information.
> 
> Does "... but if ... then" mean "by parsing the files and ignoring
> all other things that may otherwise define patterns"?
> 
>> +sparse.dir::
>> +	This config setting is ignored if present in the repository config.
>> +	Instead, this multi-valued option is present in the files listed by
>> +	`sparse.inTree` and specifies the directories needed in the
>> +	working directory. The union of all `sparse.dir` values across all
>> +	`sparse.inTree` files forms the input for `git sparse-checkout set`
>> +	in cone mode.  See linkgit:git-sparse-checkout[1] for more
>> +	information.
> 
> If this is *not* a config in the usual sense, we probably should not
> include it in this document and in the "git config --help" output.
> That will allow us to drop the first sentence.
> 
> Those .sparse/* in-tree files are like .gitmodules in the sense that
> they happen to use the same syntax so that the parser can be shared,
> but they are not allowed to affect end-user configuration
> (e.g. writing "[diff] external=rm -fr ." in the file has no effect)
> at all, right?
> 
> And we should describe what we can write in these in-tree files
> separately and make it clear that they are _different_ from the
> configuration variables.

I can move the "keys" from the config documentation and into the
git-sparse-checkout.txt file.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/10] sparse-checkout: automatically update in-tree definition
  2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
@ 2020-05-20 16:28   ` Elijah Newren
  0 siblings, 0 replies; 32+ messages in thread
From: Elijah Newren @ 2020-05-20 16:28 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, newren, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

Hi,

Sorry for the long delay.  I'm trying to read through the patch series
but may comment out of order...

On Thu, May 7, 2020 at 6:21 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> A benefit of having sparse-checkouts defined in the tree data is that
> the sparse-checkout definition can change over time. To take advantage
> of that data, a user could run "git sparse-checkout reapply" whenever
> they thought the sparse-checkout definition was out of date. That is
> likely too manual of a process, since a user would probably only
> discover this after a failure to accomplish their goal, such as a build
> failure.
>
> Prevent user frustration by automatically updating the sparse-checkout
> definition after changing the index when an in-tree definition is
> provided by config. This will happen during every index change,
> including every step of a rebase, including conflict states.

...every step of a rebase _that writes to the index_.  (Currently,
rebase writes to the index and working tree with every commit that is
applied, though that's a waste of time and resources; if the patch
applies cleanly it should be done in-memory and just continue on to
the next one.  That avoids dirtying working tree files unnecessarily
too.  I'm hoping to start working on it again soon...)

> Special care was needed around the --no-sparse-checkout option in "git
> read-tree". This previously relied on changing the skip_sparse_checkout
> option in "struct unpack_trees_options" to prevent applying the
> skip-worktree bits. However, now that we make a second update to the
> index focusing on the skip-worktree bits, this needs to be prevented in
> another way. The simplest thing to do was disable the feature through
> the core_apply_sparse_checkout global variable.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt |  7 ++++---
>  builtin/commit.c                      |  4 +++-
>  builtin/read-tree.c                   |  4 ++++
>  read-cache.c                          |  8 +++++---
>  sparse-checkout.c                     | 26 +++++++++++++++++++++++++-
>  sparse-checkout.h                     |  1 +
>  t/t1091-sparse-checkout-builtin.sh    | 19 +++++++++++++++----
>  7 files changed, 57 insertions(+), 12 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index da9322c5e41..c1713ebb1d2 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -233,9 +233,10 @@ Use `git sparse-checkout set --in-tree <path>` to initialize the patterns
>  to those included in the file at `<path>`. This will override any existing
>  patterns you have in your sparse-checkout file.
>
> -After switching between commits with different versions of this file, run
> -`git sparse-checkout reapply` to adjust the sparse-checkout patterns to
> -the new definition.
> +As Git switches between commits, it will update the in-tree sparse-checkout
> +definition according to the files available at the new commit. If any of
> +the specified files do not exist at the new commit, then the sparse-checkout
> +definition will not change.
>
>
>  SUBMODULES
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 7ba33a3bec4..0eab8d74469 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -36,6 +36,7 @@
>  #include "help.h"
>  #include "commit-reach.h"
>  #include "commit-graph.h"
> +#include "sparse-checkout.h"
>
>  static const char * const builtin_commit_usage[] = {
>         N_("git commit [<options>] [--] <pathspec>..."),
> @@ -222,7 +223,8 @@ static int commit_index_files(void)
>         case COMMIT_AS_IS:
>                 break; /* nothing to do */
>         case COMMIT_NORMAL:
> -               err = commit_lock_file(&index_lock);
> +               err = commit_lock_file(&index_lock) ||
> +                     update_in_tree_sparse_checkout();
>                 break;
>         case COMMIT_PARTIAL:
>                 err = commit_lock_file(&index_lock);

I like the fact that you found one simple place to change to make sure
all places call the necessary code.  I'm trying to think out loud
about whether that catches everything and whether it has any ill
side-effects.  Not sure if there is one, but...if someone is doing a
merge, a rebase, or even a checkout, then:

* unpack_trees() will apply sparsity patterns (but likely the old ones
that weren't updated yet, right?)
* commit_index_files() will reapply the new sparsity patterns

While I think this doesn't present any problem from a correctness
point of view, is there a performance issue here?  Is there a case
where this will cause a huge amount of unnecessary disk activity, e.g.
when old rules include some big directory that later rules don't?

> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index af7424b94c8..9ae81ffffa1 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -247,6 +247,10 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>                 parse_tree(tree);
>                 init_tree_desc(t+i, tree->buffer, tree->size);
>         }
> +
> +       if (opts.skip_sparse_checkout)
> +               core_apply_sparse_checkout = 0;
> +
>         if (unpack_trees(nr_trees, t, &opts))
>                 return 128;
>
> diff --git a/read-cache.c b/read-cache.c
> index aa427c5c170..150e73feb0d 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -25,6 +25,7 @@
>  #include "fsmonitor.h"
>  #include "thread-utils.h"
>  #include "progress.h"
> +#include "sparse-checkout.h"
>
>  /* Mask for the name length in ce_flags in the on-disk index */
>
> @@ -3074,9 +3075,10 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>
>         if (ret)
>                 return ret;
> -       if (flags & COMMIT_LOCK)
> -               ret = commit_locked_index(lock);
> -       else
> +       if (flags & COMMIT_LOCK) {
> +               ret = commit_locked_index(lock) ||
> +                     update_in_tree_sparse_checkout();
> +       } else
>                 ret = close_lock_file_gently(lock);
>
>         run_hook_le(NULL, "post-index-change",
> diff --git a/sparse-checkout.c b/sparse-checkout.c
> index d6c27ca19c4..6c58fda9722 100644
> --- a/sparse-checkout.c
> +++ b/sparse-checkout.c
> @@ -92,9 +92,12 @@ int load_in_tree_pattern_list(struct repository *r,
>                  * Exit silently, as this is likely the case where Git
>                  * changed branches to a location where the inherit file
>                  * does not exist. Do not update the sparse-checkout.
> +                *
> +                * Use -1 return to ensure populate_from_existing_patterns()
> +                * skips the sparse-checkout updates.
>                  */
>                 if (pos < 0)
> -                       return 1;
> +                       return -1;
>
>                 oid = &istate->cache[pos]->oid;
>                 type = oid_object_info(r, oid, NULL);
> @@ -145,6 +148,7 @@ int populate_sparse_checkout_patterns(struct pattern_list *pl)
>         return result;
>  }
>
> +static int updating_sparse_checkout = 0;
>  int update_working_directory(struct pattern_list *pl)
>  {
>         enum update_sparsity_result result;
> @@ -152,6 +156,10 @@ int update_working_directory(struct pattern_list *pl)
>         struct lock_file lock_file = LOCK_INIT;
>         struct repository *r = the_repository;
>
> +       if (updating_sparse_checkout)
> +               return 0;
> +       updating_sparse_checkout = 1;
> +
>         memset(&o, 0, sizeof(o));
>         o.verbose_update = isatty(2);
>         o.update = 1;
> @@ -180,9 +188,24 @@ int update_working_directory(struct pattern_list *pl)
>         else
>                 rollback_lock_file(&lock_file);
>
> +       updating_sparse_checkout = 0;
>         return result;
>  }
>
> +int update_in_tree_sparse_checkout(void)
> +{
> +       const char *first_value;
> +
> +       if (!core_apply_sparse_checkout)
> +               return 0;
> +
> +       /* only update if doing so due to sparse.inTree. */
> +       if (!git_config_get_value(SPARSE_CHECKOUT_IN_TREE, &first_value) &&
> +           first_value)
> +               return update_working_directory(NULL);
> +       return 0;
> +}
> +
>  static char *escaped_pattern(char *pattern)
>  {
>         char *p = pattern;
> @@ -273,6 +296,7 @@ int write_patterns_and_update(struct pattern_list *pl)
>                 free(sparse_filename);
>                 clear_pattern_list(pl);
>                 update_working_directory(NULL);
> +               updating_sparse_checkout = 0;
>                 return result;
>         }
>
> diff --git a/sparse-checkout.h b/sparse-checkout.h
> index 993a5701a60..fb0ba48524a 100644
> --- a/sparse-checkout.h
> +++ b/sparse-checkout.h
> @@ -13,6 +13,7 @@ char *get_sparse_checkout_filename(void);
>  int populate_sparse_checkout_patterns(struct pattern_list *pl);
>  void write_patterns_to_file(FILE *fp, struct pattern_list *pl);
>  int update_working_directory(struct pattern_list *pl);
> +int update_in_tree_sparse_checkout(void);
>  int write_patterns(struct pattern_list *pl, int and_update);
>  int write_patterns_and_update(struct pattern_list *pl);
>  void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path);
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 1040bf9c261..fdaafba5377 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -605,6 +605,7 @@ test_expect_success MINGW 'cone mode replaces backslashes with slashes' '
>  '
>
>  test_expect_success 'basis of --in-tree' '
> +       git -C repo branch no-in-tree &&
>         git -C repo config auto.crlf false &&
>         cat >folder1 <<-\EOF &&
>         [sparse]
> @@ -690,18 +691,28 @@ test_expect_success '"add" with --in-tree' '
>         check_files repo a deep folder1
>  '
>
> -test_expect_success 'reapply after updating in-tree file' '
> +test_expect_success 'automatically change after updating in-tree file' '
>         git -C repo sparse-checkout set --in-tree .sparse/sparse &&
>         check_files repo a &&
>         test_path_is_dir repo/.sparse &&
> -       echo "\tdir = folder1" >>repo/.sparse/sparse &&
> +       printf "\tdir = folder1\n" >>repo/.sparse/sparse &&
>         git -C repo commit -a -m "Update sparse file" &&
> -       git -C repo sparse-checkout reapply &&
>         check_files repo a folder1 &&
>         test_path_is_dir repo/.sparse &&
>         git -C repo checkout HEAD~1 &&
> -       git -C repo sparse-checkout reapply &&
>         check_files repo a &&
> +       test_path_is_dir repo/.sparse &&
> +       git -C repo checkout - &&
> +       check_files repo a folder1 &&
> +       test_path_is_dir repo/.sparse
> +'
> +
> +test_expect_success 'keep definition when in-tree file is missing' '
> +       git -C repo checkout no-in-tree &&
> +       check_files repo a folder1 &&
> +       test_path_is_missing repo/.sparse &&
> +       git -C repo checkout - &&
> +       check_files repo a folder1 &&
>         test_path_is_dir repo/.sparse
>  '
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] unpack-trees: avoid array out-of-bounds error
  2020-05-07 22:27   ` Junio C Hamano
  2020-05-08 12:19     ` Derrick Stolee
@ 2020-05-20 16:32     ` Elijah Newren
  1 sibling, 0 replies; 32+ messages in thread
From: Elijah Newren @ 2020-05-20 16:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, newren,
	Jeff King, Taylor Blau, Jonathan Nieder, Derrick Stolee

On Thu, May 7, 2020 at 3:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > The loop in warn_conflicted_path() that checks for the count of entries
> > with the same path uses "i+count" for the array entry. However, the loop
> > only verifies that the value of count is below the array size. Fix this
> > by adding i to the condition.
> >
> > I hit this condition during a test of the in-tree sparse-checkout
> > feature, so it is exercised by the end of the series.
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  unpack-trees.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index 9a3ccd9d083..4f880f2da90 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -563,10 +563,11 @@ static int warn_conflicted_path(struct index_state *istate,
> >       add_rejected_path(o, WARNING_SPARSE_UNMERGED_FILE, conflicting_path);
> >
> >       /* Find out how many higher stage entries at same path */
> > -     while (++count < istate->cache_nr &&
> > +     while (i + ++count < istate->cache_nr &&
> >              !strcmp(conflicting_path,
> >                      istate->cache[i+count]->name))
> >               /* do nothing */;
>
> Eek.  Yes, it is obvious that the original is wrong once you point
> it out.  But "i + ++count" looks like a line noise, and funny way
> that lines are wrapped in the original does not help X-<.

Eek, indeed.  :-(

> We may want to fix the style and the grammar while we are at it,
> perhaps like the attached.
>
> In any case, thanks for a fix.
>
>  unpack-trees.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 6bbf58d28e..c38938d96c 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -562,11 +562,11 @@ static int warn_conflicted_path(struct index_state *istate,
>
>         add_rejected_path(o, WARNING_SPARSE_UNMERGED_FILE, conflicting_path);
>
> -       /* Find out how many higher stage entries at same path */
> -       while (++count < istate->cache_nr &&
> -              !strcmp(conflicting_path,
> -                      istate->cache[i+count]->name))
> -               /* do nothing */;
> +       /* Find out how many higher stage entries are at same path */
> +       while ((++count) + i < istate->cache_nr &&
> +              !strcmp(conflicting_path, istate->cache[count + i]->name))
> +               ; /* do nothing */
> +
>         return count;
>  }

Thanks, both.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs
  2020-05-07 13:17 ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Derrick Stolee via GitGitGadget
@ 2020-05-20 16:40   ` Elijah Newren
  2020-05-21  3:49     ` Elijah Newren
  0 siblings, 1 reply; 32+ messages in thread
From: Elijah Newren @ 2020-05-20 16:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, newren, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

On Thu, May 7, 2020 at 6:21 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As we parse the in-tree config files that store the sparse.dir values
> used to create an in-tree sparse-checkout definition, we can easily
> avoid parsing the same file multiple times by using an oidset on those
> blobs. We only parse if the oid is new to the oidset.
>
> This is unlikely to have a major performance benefit right now, but will
> be extremely important when we introduce the sparse.inherit options to
> link multiple files in a directed graph. This oidset will prevent
> infinite loops when cycles exist in that digraph, or exponential blowups
> even in the case of a directed acyclic graph.

I'm still not sure if I like the idea of having a mirror dependency
structure separate from (and duplicative of) the build code; I'm still
mulling that over.

It's good that you've protected against infinite loops.

Is there any reason to prefer swallowing infinite loops rather than
warning or flagging as an error?  (I'm not sure, just thinking my
questions out loud.)

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-checkout.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/sparse-checkout.c b/sparse-checkout.c
> index 6c58fda9722..d01f4d7b525 100644
> --- a/sparse-checkout.c
> +++ b/sparse-checkout.c
> @@ -9,6 +9,7 @@
>  #include "string-list.h"
>  #include "unpack-trees.h"
>  #include "object-store.h"
> +#include "oidset.h"
>
>  char *get_sparse_checkout_filename(void)
>  {
> @@ -77,9 +78,12 @@ int load_in_tree_pattern_list(struct repository *r,
>                               struct string_list *sl,
>                               struct pattern_list *pl)
>  {
> +       int result = 0;
>         struct index_state *istate = r->index;
>         struct string_list_item *item;
>         struct strbuf path = STRBUF_INIT;
> +       struct oidset set;
> +       oidset_init(&set, 16);
>
>         pl->use_cone_patterns = 1;
>
> @@ -96,24 +100,34 @@ int load_in_tree_pattern_list(struct repository *r,
>                  * Use -1 return to ensure populate_from_existing_patterns()
>                  * skips the sparse-checkout updates.
>                  */
> -               if (pos < 0)
> -                       return -1;
> +               if (pos < 0) {
> +                       result = -1;
> +                       goto cleanup;
> +               }
>
>                 oid = &istate->cache[pos]->oid;
> +
> +               if (oidset_contains(&set, oid))
> +                       continue;
> +
> +               oidset_insert(&set, oid);
> +
>                 type = oid_object_info(r, oid, NULL);
>
>                 if (type != OBJ_BLOB) {
>                         warning(_("expected a file at '%s'; not updating sparse-checkout"),
>                                 oid_to_hex(oid));
> -                       return 1;
> +                       result = 1;
> +                       goto cleanup;
>                 }
>
>                 load_in_tree_from_blob(pl, oid);
>         }
>
> +cleanup:
>         strbuf_release(&path);
> -
> -       return 0;
> +       oidset_clear(&set);
> +       return result;
>  }
>
>  int populate_sparse_checkout_patterns(struct pattern_list *pl)
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2020-05-07 13:17 ` [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git Derrick Stolee via GitGitGadget
@ 2020-05-20 17:38 ` Elijah Newren
  2020-06-17 23:14 ` Elijah Newren
  11 siblings, 0 replies; 32+ messages in thread
From: Elijah Newren @ 2020-05-20 17:38 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, newren, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

On Thu, May 7, 2020 at 6:20 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is based on ds/sparse-allow-empty-working-tree.
>
> BACKGROUND
> ==========
>
> After discussing the future of sparse-checkout at the Git Contributor
> Summit, Minh Tai caught me during the break to introduce a very interesting
> idea: in-tree sparse-checkout definitions.
>
> Here is the context: I was talking about how the Office team is building a
> tool to update the sparse-checkout definition dynamically as their build
> definitions change. They have a very compartmentalized build system, and
> certain "projects" depend on each other. To ensure that builds do not break
> as dependencies change, the plan was to have this tool run as a hook when
> HEAD changes. This is a bit expensive to do, and we were discussing how much
> we need this to be automated versus a part of the build process.
>
> This kind of idea would need to be replicated for every team that wants to
> use sparse-checkout in this way. That doesn't bode well for wide adoption of
> the feature.
>
> IN-TREE SPARSE-CHECKOUT DEFINITIONS
> ===================================
>
> Minh's idea was simple: have sparse-checkout files in the working directory
> and use config to point to them. As these in-tree files update, we can
> automatically update the sparse-checkout definition accordingly. Now, the
> only thing to do would be to ensure that the sparse-checkout files are
> updated when someone updates the build definitions. This requires some extra
> build validation, but would not require special tools built on every client.

I'm not sure how well having a parallel dependency structure will be
received, but the idea of auto-updating does seem really nice.  (I put
a hook in the build system that compares the timestamps on the files
that define dependency structures to the .git/info/sparse-checkout
file, and re-run `git sparse-checkout reapply`, which helps with
updating but is still delayed from the checkout/merge/rebase that
brought in new dependencies.)

> As soon as I sat down to implement this idea, I discovered some
> complications to that basic idea. Each of these sparse-checkout definitions
> should fit a "role" on the team. To continue with the Office analogy,
> suppose we had a definition for developers working on each of Word,
> PowerPoint, and Excel. What happens when someone wants to port a feature
> from Excel to Word? That user would want to take the union of the two
> sparse-checkout definitions. But what does that mean when we are working
> with arbitrary sparse-checkout files?
>
> This leads to the restriction that I built into this feature: we only care
> about "cone mode" definitions. These rely on directory-based prefix matches
> that are very fast. With this restriction, it is simple to understand what
> the "union" operation should do: take all directories that would be included
> by either.

Seems like you're adopting my idea of pushing folks towards "cone
mode" and limiting support for non-code modes[1], eh?  ;-)

I'm totally on board.  :-)

[1] https://lore.kernel.org/git/CABPp-BHKYR9QBcAG_pM6DniaeGS40X=ErKGGsvWa_KhogCZzEA@mail.gmail.com/

> Once we restrict to cone mode, there is no reason to continue storing files
> using the sparse-checkout file format. The list of patterns is larger than
> the input to "git sparse-checkout set" or output from "git sparse-checkout
> list" in cone mode, and more prone to user error. Thus, let's be simpler and
> use a list of directories to specify the sparse-checkout definition.

Ahah, you did come around to using a new config file[2].  ;-)

I like having simpler config files; sounds great.

[2] https://lore.kernel.org/git/c919513a-f41f-2ce8-60ed-e0b0733c0c7f@gmail.com/
 ("another config file")

> This leads to the second issue that complicates things. If a build
> dependency changes to a core library, but there are N "roles" that depend on
> that library, then the person changing that library also needs to change and
> validate N sparse-checkout definitions! This does not scale well. So, let's
> take a page from build dependencies and allow these in-tree sparse-checkout
> definitions to depend on each other!
>
> The end result is a mechanism that should be flexible enough to future needs
> but simple enough to build using existing Git features. The format re-uses
> the config file format. This provides good re-use of code as well as being
> something easy to extend in the future. In Patch 4, I tried to describe
> several alternatives and why this is the best of those options. I'm open to
> suggestions here.
>
> OUTLINE
> =======
>
>  * Patch 1 is a bugfix that should work on its own. I caught it in tests
>    after Patch 5.
>
>
>  * Patches 2-7 implement the feature.
>
>
>  * Patches 8-9 make the Git build system a bit more robust to missing
>    directories.
>
>
>  * Patch 10 adds a .sparse directory to Git as well as two in-tree
>    sparse-checkout definitions. One is for a bare-bones Linux developer and
>    the other adds the compat code for Windows on top.
>
>
>
> As mentioned in patch 10, this allows the following workflow for Git
> contributors that want to "eat our own dogfood" with the partial clone and
> sparse-checkout features:
>
>  $ git clone --sparse --filter=blob:none https://github.com/git/git
> $ cd git
> $ git sparse-checkout set --in-tree .sparse/base.deps
>
> Then, we have a smaller working directory but can still build and test the
> code.
>
> FUTURE DIRECTIONS
> =================
>
> There are definitely ways to make this feature more seamless, or more useful
> to us.
>
>  * It would be nice to extend git clone to have a string value to
>    --sparse=<file> and possibly have it multi-valued. This would initialize
>    the in-tree sparse-checkout definition immediately upon cloning.

Also in terms of future directions, I would like to have the option
for these to be passed to --filter instead of using --filter:none.

And on top of that, it may also make sense for users to be able to
fetch cones of history somehow.  For example, if they started with
`git clone --sparse=featureA ...` and later wanted to add the
'featureB' cone, it'd be nice to be able to run some kind of `git
fetch --sparse=featureB ...` command when they know they are on a good
network to download the history of objects that touched the featureB
cone instead of relying on the normal lazy fetching of partial clones.

This kind of thing could be especially useful if, say, we were in a
pandemic and everyone was forced to work from home.  But it'd also be
useful if you just had people travelling a lot and stuck in third
world places with low connectivity, like CalTrain.  ;-)

>  * I think there are some ways we could reorganize the Git codebase to make
>    our "most basic" sparse-checkout file have even fewer files. Mostly, it
>    would require modifying the build to ask "do these files exist on-disk?
>    if not, then disable this build option." This would also require adding
>    test dependencies that also disable some tests when those do not exist.
>
>
>  * Currently, if we have an in-tree sparse-checkout definition, then we
>    can't add more directories to it manually. If someone runs git
>    sparse-checkout set --in-tree <file> followed by git sparse-checkout add
>    <dir>, then will disappear as it is overridden by the in-tree definition.
>    The only way I can think to get around this would be to add these
>    directories through Git config, since we were using the sparse-checkout
>    file as the storage of "what directories do I care about?" but it is no
>    longer the source of truth in this situation. I'm open to ideas here, and
>    it would be nice to have this interaction hardened before moving the
>    series out of RFC status.

This seems like a big problem to me.  Actually, it sounds like two big problems.

1) non-extensibility

I suspect most will be happy enough with the in-tree definitions, but
there are always a few users that want to add to the pre-defined
sparsity sets to get some additional directories.  We could use
.gitignore as an example; although people have shared ignore rules in
.gitignore, we still allow user-defined additional ignore rules that
are not meant to be shared.  Perhaps we can allow users to use some
pre-defined additional file (similar to .git/info/exclude) or even
user-defined additional file (similar to core.excludesFile)?

We would still need to be careful to not allow the in-tree definitions
to depend on files that are not in-tree, of course.

(Even with my `sparsify` script which allowed things like `./sparsify
--modules moduleA moduleB` and which walked build system dependencies
to get the right directories, I still allowed additional directories
to be specified within a .git/info/sparse-checkout-user file.  Not
sure I liked the location of the file (it's sometimes hard to find for
users of worktrees), or the basename of the file I randomly chose that
day, but it came in handy for a few people.)

2) Ugly, overly verbose (and even error-prone?) UI.

If `git sparse-checkout set --in-tree <file>` means that any
subsequent checkout will blow apart any user-ran `git sparse-checkout
add <dir>`, then we really need to give at least warning and maybe
even error messages if people are using in-tree mode and use either
set or add.  Also, this feels like we're just about deprecating the
usage of set and add without the --in-tree flag, which could lead to
years of questions of "Why do I have to specify --in-tree every time I
run these commands?"  Can we avoid that somehow?

Maybe we just take advantage of the huge backward compatibility
warning we put in place and require non-in-tree mode to specify an
additional flag?  Maybe we just check the argument to set or add when
in cone mode, and if it's a file (instead of a directory) then we
automatically assume --in-tree (or attempt to parse it first and throw
an error if it isn't in the --in-tree format)?  Maybe we pick a
different verb than add or set ('use') and guide people to it instead
of `add` and `set`?

Also, as a side note, can people specify multiple in-tree files at
once (`git sparse-checkout set --in-tree <file1> <file2>`)?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions
  2020-05-08 15:40     ` Derrick Stolee
@ 2020-05-20 17:52       ` Elijah Newren
  2020-06-17 23:07         ` Elijah Newren
  0 siblings, 1 reply; 32+ messages in thread
From: Elijah Newren @ 2020-05-20 17:52 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, Git Mailing List,
	newren, Jeff King, Taylor Blau, Jonathan Nieder, Derrick Stolee

On Fri, May 8, 2020 at 8:42 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/7/2020 6:58 PM, Junio C Hamano wrote:
> > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> One of the difficulties of using the sparse-checkout feature is not
> >> knowing which directories are absolutely needed for working in a portion
> >> of the repository. Some of this can be documented in README files or
> >> included in a bootstrapping tool along with the repository. This is done
> >> in an ad-hoc way by every project that wants to use it.
> >>
> >> Let's make this process easier for users by creating a way to define a
> >> useful sparse-checkout definition inside the Git tree data. This has
> >> several benefits. In particular, the data is available to anyone who has
> >> a copy of the repository without needing a different data source.
> >> Second, the needs of the repository can change over time and Git can
> >> present a way to automatically update the working directory as these
> >> sparse-checkout definitions change over time.
> >
> > And two lines of development can merge them together?
> >
> > Any time a new "feature" pops up that would eventually affect how
> > "git clone" and "git checkout" work based on untrusted user data, we
> > need to make sure there is no negative security implications.
> >
> > If it only boils down to "we have files that can record list of
> > leading directory names and without offering extra 'flexibility'", I
> > guess there aren't all that much that a malicious sparse definition
> > can do and we would be safe, though.
>
> Yes. I hope that we can be extremely careful with this feature.
> The RFC status of this series implicitly includes the question
> "Should we do this at all?" I think the benefits outweigh the
> risks, but we can minimize those risks with very careful design
> and implementation.
>
> >> To use this feature, add the "--in-tree" option when setting or adding
> >> directories to the sparse-checkout definition. For example:
> >>
> >>   $ git sparse-checkout set --in-tree .sparse/base
> >>   $ git sparse-checkout add --in-tree .sparse/extra
> >>
> >> These commands add values to the multi-valued config setting
> >> "sparse.inTree". When updating the sparse-checkout definition, these
> >> values describe paths in the repository to find the sparse-checkout
> >> data. After the commands listed earlier, we expect to see the following
> >> in .git/config.worktree:
> >>
> >>      [sparse]
> >>              intree = .sparse/base
> >>              intree = .sparse/extra
> >
> > What does this say in human words?  "These two tracked files specify
> > which paths should be in the working tree"?  Spelling it out here
> > would help readers of this commit.
>
> You got it. Sounds good.
>
> >> When applying the sparse-checkout definitions from this config, the
> >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded.
> >
> > OK, so end-user edit to the working tree copy or what is added to
> > the index does not count and only the committed version gets used.
> >
> > That makes it simple---I was wondering how we would operate when
> > merging a branch with different contents in the .sparse/* files
> > until the conflicts are resolved.
>
> It's worth testing this case so we can be sure what happens.

During a merge or rebase or checkout -m, what happens if .sparse/extra
has the following working tree content:

[sparse]
    dir = D
    dir = X
<<<<<< HEAD
    dir = Y
|||||| MERGE_BASE
======
    inherit = .sparse/tools
>>>>>>  MERGE_HEAD
    inherit = .sparse/base

and, of course, three different entries in the index?

Also, do we use the version of the --in-tree file from the latest
commit, from the index, or from the working tree?  (This is a question
not only for merge and rebase, but also checkout with dirty changes
and even checkout -m.)  Which one "wins"?

And what if the user updates and commits an ill-formed version of the
file -- is it equivalent to getting an empty cone with just the
toplevel directory, equivalent to getting a complete checkout of
everything, or something else?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/10] sparse-checkout: define in-tree dependencies
  2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
@ 2020-05-20 18:10   ` Elijah Newren
  2020-05-30 17:26     ` Elijah Newren
  0 siblings, 1 reply; 32+ messages in thread
From: Elijah Newren @ 2020-05-20 18:10 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, newren, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

On Thu, May 7, 2020 at 6:22 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As mentioned in the definition of the in-tree sparse-checkout
> definitions, a large repository could have many different roles for the
> users of that repository. That means that many different in-tree
> sparse-checkout definitions need to exist. If a directory needs to be
> added to many of these roles, then many of these files need to be
> edited. This is too much work to scale properly.
>
> Instead, let's make these definitions easier to maintain by following
> basic principles of build dependencies. Add simple links between these
> in-tree files using the new "sparse.inherit" config key. This multi-
> valued config setting specifies more files that are needed to define the
> sparse-checkout for a role. By separating out the common directories
> into a base file, the more specialized roles can focus their files on
> the directories only needed by those roles and leave the common
> directories to that base file.
>
> For example, suppose that .sparse/base has the following data:
>
>         [sparse]
>                 dir = A
>                 dir = B/C
>                 dir = D/E/F
>
> and .sparse/extra has the following data:
>
>         [sparse]
>                 dir = D
>                 dir = X
>                 inherit = .sparse/base
>
> Now, a user can run "git sparse-checkout set --in-tree .sparse/extra"
> and the resulting directories will define their cone-mode
> sparse-checkout patterns:
>
>         A
>         B/C
>         D
>         X
>
> Thus, the resulting sparse-checkout definition is the union of the
> definitions from .sparse/base and .sparse/extra, but the user only has
> one value of sparse.inTree of ".sparse/extra".
>
> It is simple to modify our existing logic to explore the directed graph
> created by the sparse.inherit values. We simply need to append to the
> string_list containing the list of in-tree files. Since we never repeat
> parsing on the same blob oid, this will not lead to infinite loops or
> exponential blowups in the parsing time.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt | 26 ++++++++++++++++++++++
>  sparse-checkout.c                     | 27 ++++++++++++++++++-----
>  sparse-checkout.h                     |  1 +
>  t/t1091-sparse-checkout-builtin.sh    | 31 +++++++++++++++++++++++++++
>  4 files changed, 80 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index c1713ebb1d2..941658e0011 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -238,6 +238,32 @@ definition according to the files available at the new commit. If any of
>  the specified files do not exist at the new commit, then the sparse-checkout
>  definition will not change.
>
> +In a very large repository, there may be need to have multiple of these
> +in-tree sparse-checkout definitions to fit the roles of multiple types of
> +users. As each definition grows and the number of user types grow, it can
> +be difficult to manage updating all of the definitions when a common core
> +is modified somehow. For this reason, the in-tree pattern sets can inherit
> +the directories from other in-tree pattern sets. Use the `sparse.inherit`
> +option to specify other files in the tree.
> +
> +For example, suppose the file listed earlier is at `.sparse/core`. Another
> +file could be stored as `.sparse/extra` with contents
> +
> +----------------------------------
> +[sparse]
> +       dir = X
> +       dir = Y/Z
> +       inherit = .sparse/core
> +----------------------------------

So you apparently expect full path rather than a relative path.  This
example does imply this, but it might be good to explicitly state it.

What happens if the user specifies a non-existent path, or perhaps
equivalently, didn't understand and specified the path using a
relative path specification?  What if the user specifies a path that
doesn't exist in the commit, but happens to exist in someones's
working directory?  (And maybe even did it intentionally as a way to
cheat and add user-defined additional paths to include?  Do we want to
allow that, or do we want to enforce that extra includes use some kind
of pre-defined path?)  What if they specify some path that is invalid
to record in a git commit (.git/mydeps or
../../../../../../../otherdeps) but happens to exist on some machines?
 (Are there future security vulnerabilities of any sort going down
this path?)

> +
> +Then, if you run `git sparse-checkout set --in-tree .sparse/extra`, the
> +sparse-checkout definition will include `X` and `Y/Z` from `.sparse/extra`
> +as well as `A`, `B/C`, and `D/E/F` from `.sparse/core`. This is similar
> +to specifying both `.sparse/core` and `.sparse/extra` in the `set`
> +subcommand, but has a slight advantage. If the `.sparse/extra` file changes
> +the set of `inherit` files, then your sparse-checkout definition will
> +update accordingly as you switch between commits.
> +
>
>  SUBMODULES
>  ----------
> diff --git a/sparse-checkout.c b/sparse-checkout.c
> index d01f4d7b525..4edeab49a10 100644
> --- a/sparse-checkout.c
> +++ b/sparse-checkout.c
> @@ -66,12 +66,29 @@ static int sparse_dir_cb(const char *var, const char *value, void *data)
>         return 0;
>  }
>
> +static int sparse_inherit_cb(const char *var, const char *value, void *data)
> +{
> +       struct string_list *sl = (struct string_list *)data;
> +
> +       if (!strcmp(var, SPARSE_CHECKOUT_INHERIT))
> +               string_list_append(sl, value);

So anyone can record lots of additional items within these files, and
we'll ignore them and only pay attention to certain fields.  That
probably allows future versions of git to write new values and old
versions of git to not choke on them.  Does it lead to entrepreneurial
developers cramming random additional fields in these files that
possibly conflict with fieldnames that we want to use in the future?
Do we want to document what is allowed and what is not?

> +
> +       return 0;
> +}
> +
>  static int load_in_tree_from_blob(struct pattern_list *pl,
> -                                 struct object_id *oid)
> +                                 struct object_id *oid,
> +                                 struct string_list *inherit)
>  {
> -       return git_config_from_blob_oid(sparse_dir_cb,
> -                                       SPARSE_CHECKOUT_DIR,
> -                                       oid, pl);
> +       if (git_config_from_blob_oid(sparse_dir_cb,
> +                                    SPARSE_CHECKOUT_DIR,
> +                                    oid, pl))
> +               return 1;
> +       if (git_config_from_blob_oid(sparse_inherit_cb,
> +                                    SPARSE_CHECKOUT_INHERIT,
> +                                    oid, inherit))
> +               return 1;
> +       return 0;
>  }
>
>  int load_in_tree_pattern_list(struct repository *r,
> @@ -121,7 +138,7 @@ int load_in_tree_pattern_list(struct repository *r,
>                         goto cleanup;
>                 }
>
> -               load_in_tree_from_blob(pl, oid);
> +               load_in_tree_from_blob(pl, oid, sl);
>         }
>
>  cleanup:
> diff --git a/sparse-checkout.h b/sparse-checkout.h
> index fb0ba48524a..8b766ea38fb 100644
> --- a/sparse-checkout.h
> +++ b/sparse-checkout.h
> @@ -5,6 +5,7 @@
>  #include "repository.h"
>
>  #define SPARSE_CHECKOUT_DIR "sparse.dir"
> +#define SPARSE_CHECKOUT_INHERIT "sparse.inherit"
>  #define SPARSE_CHECKOUT_IN_TREE "sparse.intree"
>
>  struct pattern_list;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index fdaafba5377..b2389e5b5e6 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -716,4 +716,35 @@ test_expect_success 'keep definition when in-tree file is missing' '
>         test_path_is_dir repo/.sparse
>  '
>
> +test_expect_success 'inherit definition from other files' '
> +       cat >repo/.sparse/inherit <<-EOF &&
> +       [sparse]
> +               inherit = .sparse/sparse
> +               inherit = .sparse/deep
> +               inherit = .sparse/deeper1
> +       EOF
> +       git -C repo add .sparse &&
> +       git -C repo commit -m "Add inherited file" &&
> +       git -C repo sparse-checkout set --in-tree .sparse/inherit &&
> +       check_files repo a deep folder1 &&
> +       check_files repo/deep a deeper1 deeper2 &&
> +       test_path_is_dir repo/.sparse &&
> +       cat >repo/.sparse/sparse <<-EOF &&
> +       [sparse]
> +               dir = .sparse
> +       EOF
> +       git -C repo commit -a -m "drop folder1 from sparse" &&
> +       check_files repo a deep &&
> +       check_files repo/deep a deeper1 deeper2 &&
> +       test_path_is_dir repo/.sparse
> +'
> +
> +test_expect_success 'inherit files can have cycles' '
> +       echo "\tinherit = .sparse/inherit" >>repo/.sparse/sparse &&
> +       git -C repo commit -a -m "create inherit cycle" &&
> +       check_files repo a deep &&
> +       check_files repo/deep a deeper1 deeper2 &&
> +       test_path_is_dir repo/.sparse
> +'
> +
>  test_done
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs
  2020-05-20 16:40   ` Elijah Newren
@ 2020-05-21  3:49     ` Elijah Newren
  2020-05-21 17:54       ` Derrick Stolee
  0 siblings, 1 reply; 32+ messages in thread
From: Elijah Newren @ 2020-05-21  3:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

Replying to my own questions...

On Wed, May 20, 2020 at 9:40 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, May 7, 2020 at 6:21 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > As we parse the in-tree config files that store the sparse.dir values
> > used to create an in-tree sparse-checkout definition, we can easily
> > avoid parsing the same file multiple times by using an oidset on those
> > blobs. We only parse if the oid is new to the oidset.
> >
> > This is unlikely to have a major performance benefit right now, but will
> > be extremely important when we introduce the sparse.inherit options to
> > link multiple files in a directed graph. This oidset will prevent
> > infinite loops when cycles exist in that digraph, or exponential blowups
> > even in the case of a directed acyclic graph.
>
> I'm still not sure if I like the idea of having a mirror dependency
> structure separate from (and duplicative of) the build code; I'm still
> mulling that over.

I mentioned this to a few other buildsystem folks at $DAYJOB.  They
were strongly opposed to having more than one source of truth, but
generating the git in-tree sparse values from the official build
system files, with commit hooks and build system checks to make sure
they get updated seemed like it'd be fine or not concern them much.

> It's good that you've protected against infinite loops.
>
> Is there any reason to prefer swallowing infinite loops rather than
> warning or flagging as an error?  (I'm not sure, just thinking my
> questions out loud.)

The buildsystem folks also reminded me that we have cylic dependencies
already, and although it's absolutely ugly, it is somewhat forced on
us by a combination of different external tools that we can't change.
As such, warnings or errors would be really annoying and we'd be one
of the ones to want to turn them off.  So drop that idea from me.  :-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs
  2020-05-21  3:49     ` Elijah Newren
@ 2020-05-21 17:54       ` Derrick Stolee
  0 siblings, 0 replies; 32+ messages in thread
From: Derrick Stolee @ 2020-05-21 17:54 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

On 5/20/2020 11:49 PM, Elijah Newren wrote:
> Replying to my own questions...
> 
> On Wed, May 20, 2020 at 9:40 AM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Thu, May 7, 2020 at 6:21 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>>
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> As we parse the in-tree config files that store the sparse.dir values
>>> used to create an in-tree sparse-checkout definition, we can easily
>>> avoid parsing the same file multiple times by using an oidset on those
>>> blobs. We only parse if the oid is new to the oidset.
>>>
>>> This is unlikely to have a major performance benefit right now, but will
>>> be extremely important when we introduce the sparse.inherit options to
>>> link multiple files in a directed graph. This oidset will prevent
>>> infinite loops when cycles exist in that digraph, or exponential blowups
>>> even in the case of a directed acyclic graph.
>>
>> I'm still not sure if I like the idea of having a mirror dependency
>> structure separate from (and duplicative of) the build code; I'm still
>> mulling that over.
> 
> I mentioned this to a few other buildsystem folks at $DAYJOB.  They
> were strongly opposed to having more than one source of truth, but
> generating the git in-tree sparse values from the official build
> system files, with commit hooks and build system checks to make sure
> they get updated seemed like it'd be fine or not concern them much.

The intention is that these files are "downstream" from the build
system. As the build changes, it would adjust the config files to
tell Git what to do.

>> It's good that you've protected against infinite loops.
>>
>> Is there any reason to prefer swallowing infinite loops rather than
>> warning or flagging as an error?  (I'm not sure, just thinking my
>> questions out loud.)
> 
> The buildsystem folks also reminded me that we have cylic dependencies
> already, and although it's absolutely ugly, it is somewhat forced on
> us by a combination of different external tools that we can't change.
> As such, warnings or errors would be really annoying and we'd be one
> of the ones to want to turn them off.  So drop that idea from me.  :-)

This is more common than one might think!

Thanks for taking a look,
-Stolee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/10] sparse-checkout: define in-tree dependencies
  2020-05-20 18:10   ` Elijah Newren
@ 2020-05-30 17:26     ` Elijah Newren
  0 siblings, 0 replies; 32+ messages in thread
From: Elijah Newren @ 2020-05-30 17:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, newren, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

Just thought of another angle here...  (This feature you suggest
sounds interesting, but I'm still mulling over in my head whether it's
possible to make it work and how...)

On Wed, May 20, 2020 at 11:10 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, May 7, 2020 at 6:22 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

> > --- a/Documentation/git-sparse-checkout.txt
> > +++ b/Documentation/git-sparse-checkout.txt
> > @@ -238,6 +238,32 @@ definition according to the files available at the new commit. If any of
> >  the specified files do not exist at the new commit, then the sparse-checkout
> >  definition will not change.
> >
> > +In a very large repository, there may be need to have multiple of these
> > +in-tree sparse-checkout definitions to fit the roles of multiple types of
> > +users. As each definition grows and the number of user types grow, it can
> > +be difficult to manage updating all of the definitions when a common core
> > +is modified somehow. For this reason, the in-tree pattern sets can inherit
> > +the directories from other in-tree pattern sets. Use the `sparse.inherit`
> > +option to specify other files in the tree.
> > +
> > +For example, suppose the file listed earlier is at `.sparse/core`. Another
> > +file could be stored as `.sparse/extra` with contents
> > +
> > +----------------------------------
> > +[sparse]
> > +       dir = X
> > +       dir = Y/Z
> > +       inherit = .sparse/core
> > +----------------------------------
>
...
> What happens if the user specifies a non-existent path, or perhaps
> equivalently, didn't understand and specified the path using a
> relative path specification?  What if the user specifies a path that
> doesn't exist in the commit, but happens to exist in someones's
> working directory?  (And maybe even did it intentionally as a way to
> cheat and add user-defined additional paths to include?  Do we want to
> allow that, or do we want to enforce that extra includes use some kind
> of pre-defined path?)  What if they specify some path that is invalid
> to record in a git commit (.git/mydeps or
> ../../../../../../../otherdeps) but happens to exist on some machines?
>  (Are there future security vulnerabilities of any sort going down
> this path?)

Although the above considers non-existent paths, perhaps a more common
way that a non-existent path specification arises is when people do
add paths that exist, but then a later merge or rebase or even 'git rm
...' removes them.  In such a case, the path used to be valid, it just
isn't any more.  Perhaps this suggests that the right fallback is to
print a warning and then just ignore that path specification?

(Still have no clue how to handle merge conflicts, as discussed on
patch 4, but maybe this at least addresses one case.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions
  2020-05-20 17:52       ` Elijah Newren
@ 2020-06-17 23:07         ` Elijah Newren
  2020-06-18  8:18           ` Son Luong Ngoc
  0 siblings, 1 reply; 32+ messages in thread
From: Elijah Newren @ 2020-06-17 23:07 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, Git Mailing List,
	newren, Jeff King, Taylor Blau, Jonathan Nieder, Derrick Stolee,
	Son Luong Ngoc

On Wed, May 20, 2020 at 10:52 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, May 8, 2020 at 8:42 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 5/7/2020 6:58 PM, Junio C Hamano wrote:
> > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > >
> > >> One of the difficulties of using the sparse-checkout feature is not
> > >> knowing which directories are absolutely needed for working in a portion
> > >> of the repository. Some of this can be documented in README files or
> > >> included in a bootstrapping tool along with the repository. This is done
> > >> in an ad-hoc way by every project that wants to use it.
> > >>
> > >> Let's make this process easier for users by creating a way to define a
> > >> useful sparse-checkout definition inside the Git tree data. This has
> > >> several benefits. In particular, the data is available to anyone who has
> > >> a copy of the repository without needing a different data source.
> > >> Second, the needs of the repository can change over time and Git can
> > >> present a way to automatically update the working directory as these
> > >> sparse-checkout definitions change over time.
> > >
> > > And two lines of development can merge them together?
> > >
> > > Any time a new "feature" pops up that would eventually affect how
> > > "git clone" and "git checkout" work based on untrusted user data, we
> > > need to make sure there is no negative security implications.
> > >
> > > If it only boils down to "we have files that can record list of
> > > leading directory names and without offering extra 'flexibility'", I
> > > guess there aren't all that much that a malicious sparse definition
> > > can do and we would be safe, though.
> >
> > Yes. I hope that we can be extremely careful with this feature.
> > The RFC status of this series implicitly includes the question
> > "Should we do this at all?" I think the benefits outweigh the
> > risks, but we can minimize those risks with very careful design
> > and implementation.
> >
> > >> To use this feature, add the "--in-tree" option when setting or adding
> > >> directories to the sparse-checkout definition. For example:
> > >>
> > >>   $ git sparse-checkout set --in-tree .sparse/base
> > >>   $ git sparse-checkout add --in-tree .sparse/extra
> > >>
> > >> These commands add values to the multi-valued config setting
> > >> "sparse.inTree". When updating the sparse-checkout definition, these
> > >> values describe paths in the repository to find the sparse-checkout
> > >> data. After the commands listed earlier, we expect to see the following
> > >> in .git/config.worktree:
> > >>
> > >>      [sparse]
> > >>              intree = .sparse/base
> > >>              intree = .sparse/extra
> > >
> > > What does this say in human words?  "These two tracked files specify
> > > which paths should be in the working tree"?  Spelling it out here
> > > would help readers of this commit.
> >
> > You got it. Sounds good.
> >
> > >> When applying the sparse-checkout definitions from this config, the
> > >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded.
> > >
> > > OK, so end-user edit to the working tree copy or what is added to
> > > the index does not count and only the committed version gets used.
> > >
> > > That makes it simple---I was wondering how we would operate when
> > > merging a branch with different contents in the .sparse/* files
> > > until the conflicts are resolved.
> >
> > It's worth testing this case so we can be sure what happens.
>
> During a merge or rebase or checkout -m, what happens if .sparse/extra
> has the following working tree content:
>
> [sparse]
>     dir = D
>     dir = X
> <<<<<< HEAD
>     dir = Y
> |||||| MERGE_BASE
> ======
>     inherit = .sparse/tools
> >>>>>>  MERGE_HEAD
>     inherit = .sparse/base
>
> and, of course, three different entries in the index?
>
> Also, do we use the version of the --in-tree file from the latest
> commit, from the index, or from the working tree?  (This is a question
> not only for merge and rebase, but also checkout with dirty changes
> and even checkout -m.)  Which one "wins"?
>
> And what if the user updates and commits an ill-formed version of the
> file -- is it equivalent to getting an empty cone with just the
> toplevel directory, equivalent to getting a complete checkout of
> everything, or something else?

Son pointed out that mercurial has a 'sparse' extension that has some
possible ideas of things we could do here; see
https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/
for some further discussion.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
  2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2020-05-20 17:38 ` [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Elijah Newren
@ 2020-06-17 23:14 ` Elijah Newren
  2020-06-18  1:42   ` Derrick Stolee
  11 siblings, 1 reply; 32+ messages in thread
From: Elijah Newren @ 2020-06-17 23:14 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

Hi,

Another late addition...

On Thu, May 7, 2020 at 6:20 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:

> IN-TREE SPARSE-CHECKOUT DEFINITIONS
> ===================================
>
> Minh's idea was simple: have sparse-checkout files in the working directory
> and use config to point to them. As these in-tree files update, we can
> automatically update the sparse-checkout definition accordingly. Now, the
> only thing to do would be to ensure that the sparse-checkout files are
> updated when someone updates the build definitions. This requires some extra
> build validation, but would not require special tools built on every client.

"In-tree" still bugs me after a few weeks; the wording seems slightly
awkward.  I don't have a good suggestion, but I'm curious if there's a
better term.

But I really came here to comment on another issue I think I glossed
over the first time around.  I'm curious if all module definition
files have to exist in the working directory, as possibly suggested
above, or if we can allow them to just exist in the index.  To give
you a flavor for what I mean, with my sparsify tool people can do
things like:
    ./sparsify --modules MODULE_A
which provides MODULE_A and it's dependencies while removing all other
directories.  If MODULE_B, is not a dependency (direct or transitive)
of MODULE_A, it will not exist in the working directory after this
step.  Our equivalent of the "in-tree" definition of MODULE_B exists
*in* the directory for MODULE_B, because it seems to make sense for
us.  I want people to be able to do
    ./sparsify --modules MODULE_B
and have it correctly check out all the necessary files even though
the definition of MODULE_B wasn't even in the working directory at the
time the command ran.  (The sparsify script knows to check the working
directory first, then fall back to the index).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
  2020-06-17 23:14 ` Elijah Newren
@ 2020-06-18  1:42   ` Derrick Stolee
  2020-06-18  1:59     ` Elijah Newren
  0 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2020-06-18  1:42 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

On 6/17/2020 7:14 PM, Elijah Newren wrote:
> Hi,
> 
> Another late addition...
> 
> On Thu, May 7, 2020 at 6:20 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> 
>> IN-TREE SPARSE-CHECKOUT DEFINITIONS
>> ===================================
>>
>> Minh's idea was simple: have sparse-checkout files in the working directory
>> and use config to point to them. As these in-tree files update, we can
>> automatically update the sparse-checkout definition accordingly. Now, the
>> only thing to do would be to ensure that the sparse-checkout files are
>> updated when someone updates the build definitions. This requires some extra
>> build validation, but would not require special tools built on every client.
> 
> "In-tree" still bugs me after a few weeks; the wording seems slightly
> awkward.  I don't have a good suggestion, but I'm curious if there's a
> better term.

I am open to suggestions. It reminds me of the two hardest problems
in software engineering:

	1. concurrency
	2. naming things
	3. off-by-one errors

> But I really came here to comment on another issue I think I glossed
> over the first time around.  I'm curious if all module definition
> files have to exist in the working directory, as possibly suggested
> above, or if we can allow them to just exist in the index.  To give
> you a flavor for what I mean, with my sparsify tool people can do
> things like:
>     ./sparsify --modules MODULE_A
> which provides MODULE_A and it's dependencies while removing all other
> directories.  If MODULE_B, is not a dependency (direct or transitive)
> of MODULE_A, it will not exist in the working directory after this
> step.  Our equivalent of the "in-tree" definition of MODULE_B exists
> *in* the directory for MODULE_B, because it seems to make sense for
> us.  I want people to be able to do
>     ./sparsify --modules MODULE_B
> and have it correctly check out all the necessary files even though
> the definition of MODULE_B wasn't even in the working directory at the
> time the command ran.  (The sparsify script knows to check the working
> directory first, then fall back to the index).

I think one tricky part of my RFC is that it _only_ looks at the
index. This allows us to read the contents even when the files are
not part of the current sparse-checkout definition.

You mentioned in another thread that it is a bit unwieldy for a user
to rely on a committed (or staged?) file, so adding the ability to
check the working directory first is interesting. I wonder how the
timing comes into play when changing HEAD to a new commit? Seems
tricky, but solvable.

Thanks,
-Stolee




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
  2020-06-18  1:42   ` Derrick Stolee
@ 2020-06-18  1:59     ` Elijah Newren
  2020-06-18  3:01       ` Derrick Stolee
  0 siblings, 1 reply; 32+ messages in thread
From: Elijah Newren @ 2020-06-18  1:59 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Jeff King,
	Taylor Blau, Jonathan Nieder, Derrick Stolee

On Wed, Jun 17, 2020 at 6:42 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/17/2020 7:14 PM, Elijah Newren wrote:
> > Hi,
> >
> > Another late addition...
> >
> > On Thu, May 7, 2020 at 6:20 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >
> >> IN-TREE SPARSE-CHECKOUT DEFINITIONS
> >> ===================================
> >>
> >> Minh's idea was simple: have sparse-checkout files in the working directory
> >> and use config to point to them. As these in-tree files update, we can
> >> automatically update the sparse-checkout definition accordingly. Now, the
> >> only thing to do would be to ensure that the sparse-checkout files are
> >> updated when someone updates the build definitions. This requires some extra
> >> build validation, but would not require special tools built on every client.
> >
> > "In-tree" still bugs me after a few weeks; the wording seems slightly
> > awkward.  I don't have a good suggestion, but I'm curious if there's a
> > better term.
>
> I am open to suggestions. It reminds me of the two hardest problems
> in software engineering:
>
>         1. concurrency
>         2. naming things
>         3. off-by-one errors

:-)

> > But I really came here to comment on another issue I think I glossed
> > over the first time around.  I'm curious if all module definition
> > files have to exist in the working directory, as possibly suggested
> > above, or if we can allow them to just exist in the index.  To give
> > you a flavor for what I mean, with my sparsify tool people can do
> > things like:
> >     ./sparsify --modules MODULE_A
> > which provides MODULE_A and it's dependencies while removing all other
> > directories.  If MODULE_B, is not a dependency (direct or transitive)
> > of MODULE_A, it will not exist in the working directory after this
> > step.  Our equivalent of the "in-tree" definition of MODULE_B exists
> > *in* the directory for MODULE_B, because it seems to make sense for
> > us.  I want people to be able to do
> >     ./sparsify --modules MODULE_B
> > and have it correctly check out all the necessary files even though
> > the definition of MODULE_B wasn't even in the working directory at the
> > time the command ran.  (The sparsify script knows to check the working
> > directory first, then fall back to the index).
>
> I think one tricky part of my RFC is that it _only_ looks at the
> index. This allows us to read the contents even when the files are
> not part of the current sparse-checkout definition.
>
> You mentioned in another thread that it is a bit unwieldy for a user
> to rely on a committed (or staged?) file, so adding the ability to
> check the working directory first is interesting. I wonder how the
> timing comes into play when changing HEAD to a new commit? Seems
> tricky, but solvable.

Isn't that essentially the same timing issue that comes into play if
you only look at the index, and are changing HEAD to a new commit?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
  2020-06-18  1:59     ` Elijah Newren
@ 2020-06-18  3:01       ` Derrick Stolee
  2020-06-18  5:03         ` Elijah Newren
  0 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2020-06-18  3:01 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Jeff King,
	Taylor Blau, Jonathan Nieder, Derrick Stolee

On 6/17/2020 9:59 PM, Elijah Newren wrote:
> On Wed, Jun 17, 2020 at 6:42 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/17/2020 7:14 PM, Elijah Newren wrote:
>> You mentioned in another thread that it is a bit unwieldy for a user
>> to rely on a committed (or staged?) file, so adding the ability to
>> check the working directory first is interesting. I wonder how the
>> timing comes into play when changing HEAD to a new commit? Seems
>> tricky, but solvable.
> 
> Isn't that essentially the same timing issue that comes into play if
> you only look at the index, and are changing HEAD to a new commit?

It adds to the complexity. We can inspect the new index for the
in-tree definition and update the skip-worktree bits before actually
changing the working directory to match the new index. However, if
we trust the working directory copy over the index copy, then we
need to also decide if the working directory copy _would_ change
in this move before using it.

Of course, maybe I'm making it over-complicated. I still know so
little about the index. I got into this feature due to a simple
pattern-matching problem that I could understand, but now I'm
getting lost in index states and dirty statuses. ;)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
  2020-06-18  3:01       ` Derrick Stolee
@ 2020-06-18  5:03         ` Elijah Newren
  0 siblings, 0 replies; 32+ messages in thread
From: Elijah Newren @ 2020-06-18  5:03 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Jeff King,
	Taylor Blau, Jonathan Nieder, Derrick Stolee

On Wed, Jun 17, 2020 at 8:01 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/17/2020 9:59 PM, Elijah Newren wrote:
> > On Wed, Jun 17, 2020 at 6:42 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 6/17/2020 7:14 PM, Elijah Newren wrote:
> >> You mentioned in another thread that it is a bit unwieldy for a user
> >> to rely on a committed (or staged?) file, so adding the ability to
> >> check the working directory first is interesting. I wonder how the
> >> timing comes into play when changing HEAD to a new commit? Seems
> >> tricky, but solvable.
> >
> > Isn't that essentially the same timing issue that comes into play if
> > you only look at the index, and are changing HEAD to a new commit?
>
> It adds to the complexity. We can inspect the new index for the
> in-tree definition and update the skip-worktree bits before actually
> changing the working directory to match the new index. However, if
> we trust the working directory copy over the index copy, then we
> need to also decide if the working directory copy _would_ change
> in this move before using it.
>
> Of course, maybe I'm making it over-complicated. I still know so
> little about the index. I got into this feature due to a simple
> pattern-matching problem that I could understand, but now I'm
> getting lost in index states and dirty statuses. ;)

Honestly, I think your first cut for switching branches with this new
feature should just be:
  1) Do a switch/checkout exactly the way it's done today already:
    1a) Load the index and existing sparsity rules (from worktree then
index), according to what existed before the branch switch
    1b) Do the appropriate 1-way or 2-way merge from unpack_trees() to
move to the right branch (as builtin/checkout.c already does)
  2) *If* using in-tree sparsity feature, re-load the sparsity rules
(from worktree then index) and run the equivalent of `sparse-checkout
reapply`

Not only does this avoid messing up any code for the normal case, I'm
pretty sure this gets the behavior the user expects in all the special
cases.  There is one big downside, and a little downside.  The big
downside is that it'll have two progress meters instead of just one
(much like sparse-checkout init && sparse-checkout set do today).  The
little downside is that this isn't optimal from a performance
standpoint, for two reasons:
(a) some files may be updated in the working tree in step 1b despite
the fact that they'll be removed in step 2.
(b) step 2 may just be an expensive no-op, in fact I suspect it will
be most the time

The reason I think the performance isn't a big deal is because:
C) the fact that (b) is a problem means (a) is not often an issue --
the sparsity patterns are usually the same.
D) Even if the sparsity patterns differ, it's often the case that
files within the tree won't change (people tend to only modify a few
files per commit, after all, so even switching branches only updates a
subset of all files).
E) Even if the sparsity patterns differ and files differ and have to
be updated, it's still proportional at most to the number of files
selected by the sparsity patterns, so it won't feel excessive or slow
to the user.
F) `sparse-checkout reapply` is pretty cheap, especially if called as
a function instead of as a subprocess


In my opinion, other than the unfortunate dual progress meters, the
only place where this new feature gets hairy is that `git
sparse-checkout reapply` (or its libified variant) now needs to know
how to "read your new form of sparsity rules".  That sounds easy at
first but in my opinion it is a bit of a mess.  The user could have
written garbage to the file and I don't know what to do with garbage,
but I think we have to do something sane.  (Give up and checkout
everything?  Give up and checkout nothing but the toplevel directory?
Ignore each line we don't understand and treat the rest as stuff that
needs to be checked out?  Can that last one even be done with the .ini
reader?)  I don't think saying "well, tough luck the user should know
better" works because we already know that merges or rebases or
checkout -m could have written garbage to those files (in the form of
conflict markers), and users will say that *we* were the ones that
filled the file with garbage.  So we have to handle it intelligently
in some fashion.

Thoughts?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions
  2020-06-17 23:07         ` Elijah Newren
@ 2020-06-18  8:18           ` Son Luong Ngoc
  0 siblings, 0 replies; 32+ messages in thread
From: Son Luong Ngoc @ 2020-06-18  8:18 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, newren, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

Hi,

On Wed, Jun 17, 2020 at 04:07:01PM -0700, Elijah Newren wrote:
> 
> Son pointed out that mercurial has a 'sparse' extension that has some
> possible ideas of things we could do here; see
> https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/
> for some further discussion.

I just want to note that you can find the latest version of FB's 'sparse'
extension here[1] and the tests for 'profile' feature could be found here[2].

Another relevant source of reading could be Google's Narrow extension for
Mercurial[3].

Cheers,
Son Luong.

[1]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/edenscm/hgext/sparse.py
[2]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/tests/test-sparse-profiles.t
[3]: https://bitbucket.org/Google/narrowhg/src/cb51d673e9c5820fc3da86a67f7e74b789820b4f/tests/test-merge.t#lines-63

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-06-18  8:18 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
2020-05-07 22:27   ` Junio C Hamano
2020-05-08 12:19     ` Derrick Stolee
2020-05-08 15:09       ` Junio C Hamano
2020-05-20 16:32     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 02/10] sparse-checkout: move code from builtin Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 03/10] sparse-checkout: move code from unpack-trees.c Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
2020-05-07 22:58   ` Junio C Hamano
2020-05-08 15:40     ` Derrick Stolee
2020-05-20 17:52       ` Elijah Newren
2020-06-17 23:07         ` Elijah Newren
2020-06-18  8:18           ` Son Luong Ngoc
2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
2020-05-20 16:28   ` Elijah Newren
2020-05-07 13:17 ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Derrick Stolee via GitGitGadget
2020-05-20 16:40   ` Elijah Newren
2020-05-21  3:49     ` Elijah Newren
2020-05-21 17:54       ` Derrick Stolee
2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
2020-05-20 18:10   ` Elijah Newren
2020-05-30 17:26     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 08/10] Makefile: skip git-gui if dir is missing Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 09/10] Makefile: disable GETTEXT when 'po' " Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git Derrick Stolee via GitGitGadget
2020-05-20 17:38 ` [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Elijah Newren
2020-06-17 23:14 ` Elijah Newren
2020-06-18  1:42   ` Derrick Stolee
2020-06-18  1:59     ` Elijah Newren
2020-06-18  3:01       ` Derrick Stolee
2020-06-18  5:03         ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).