git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
@ 2019-08-20 15:11 Derrick Stolee via GitGitGadget
  2019-08-20 15:11 ` [PATCH 1/9] sparse-checkout: create builtin with 'list' subcommand Derrick Stolee via GitGitGadget
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

This RFC includes a potential direction to make the sparse-checkout more
user-friendly. While there, I also present a way to use a limited set of
patterns to gain a significant performance boost in very large repositories.

Sparse-checkout is only documented as a subsection of the read-tree docs
[1], which makes the feature hard to discover. Users have trouble navigating
the feature, especially at clone time [2], and have even resorted to
creating their own helper tools [3].

This RFC attempts to solve these problems using a new builtin. Here is a
sample workflow to give a feeling for how it can work:

In an existing repo:

$ git sparse-checkout init
$ ls
myFile1.txt myFile2.txt
$ git sparse-checkout add
/myFolder/*
^D
$ ls
myFile1.txt myFile2.txt myFolder
$ ls myFolder
a.c a.h
$ git sparse-checkout disable
$ ls
hiddenFolder myFile1.txt myFile2.txt myFolder

At clone time:

$ git clone --sparse origin repo
$ cd repo
$ ls
myFile1.txt myFile2.txt
$ git sparse-checkout add
/myFolder/*
^D
$ ls
myFile1.txt myFile2.txt myFolder

Here are some more specific details:

 * git sparse-checkout init enables core.sparseCheckout and populates the
   sparse-checkout file with patterns that match only the files at root.
   
   
 * git clone learns the --sparse argument to run git sparse-checkout init 
   before the first checkout.
   
   
 * git sparse-checkout add reads patterns from stdin, one per line, then
   adds them to the sparse-checkout file and refreshes the working
   directory.
   
   
 * git sparse-checkout disable removes the patterns from the sparse-checkout
   file, disables core.sparseCheckout, and refills the working directory.
   
   
 * git sparse-checkout list lists the contents of the sparse-checkout file.
   
   

The documentation for the sparse-checkout feature can now live primarily
with the git-sparse-checkout documentation.

Cone Mode
=========

What really got me interested in this area is a performance problem. If we
have N patterns in the sparse-checkout file and M entries in the index, then
we can perform up to O(N * M) pattern checks in clear_ce_flags(). This
quadratic growth is not sustainable in a repo with 1,000+ patterns and
1,000,000+ index entries.

To solve this problem, I propose a new, more restrictive mode to
sparse-checkout: "cone mode". In this mode, all patterns are based on prefix
matches at a directory level. This can then use hashsets for fast
performance -- O(M) instead of O(N*M). My hashset implementation is based on
the virtual filesystem hook in the VFS for Git custom code [4].

In cone mode, a user specifies a list of folders which the user wants every
file inside. In addition, the cone adds all blobs that are siblings of the
folders in the directory path to that folder. This makes the directories
look "hydrated" as a user drills down to those recursively-closed folders.
These directories are called "parent" folders, as a file matches them only
if the file's immediate parent is that directory.

When building a prototype of this feature, I used a separate file to contain
the list of recursively-closed folders and built the hashsets dynamically
based on that file. In this implementation, I tried to maximize the amount
of backwards-compatibility by storing all data in the sparse-checkout file
using patterns recognized by earlier Git versions.

For example, if we add A/B/C as a recursive folder, then we add the
following patterns to the sparse-checkout file:

/*
!/*/*
/A/*
!/A/*/*
/A/B/*
!/A/B/*/*
/A/B/C/*

The alternating positive/negative patterns say "include everything in this
folder, but exclude everything another level deeper". The final pattern has
no matching negation, so is a recursively closed pattern.

Note that I have some basic warnings to try and check that the
sparse-checkout file doesn't match what would be written by a cone-mode add.
In such a case, Git writes a warning to stderr and continues with the old
pattern matching algorithm. These checks are currently very barebones, and
would need to be updated with more robust checks for things like regex
characters in the middle of the pattern. As review moves forward (and if we
don't change the data storage) then we could spend more time on this.

Thanks, -Stolee

[1] https://git-scm.com/docs/git-read-tree#_sparse_checkoutSparse-checkout
documentation in git-read-tree.

[2] https://stackoverflow.com/a/4909267/127088Is it possible to do a sparse
checkout without checking out the whole repository first?

[3] http://www.marcoyuen.com/articles/2016/06/07/git-sparse.htmlA blog post
of a user's extra "git-sparse" helper.

[4] 
https://github.com/git/git/compare/fc5fd706ff733392053e6180086a4d7f96acc2af...01204f24c5349aa2fb0c474546d768946d315dab
The virtual filesystem hook in microsoft/git.

Derrick Stolee (8):
  sparse-checkout: create builtin with 'list' subcommand
  sparse-checkout: create 'init' subcommand
  clone: add --sparse mode
  sparse-checkout: 'add' subcommand
  sparse-checkout: create 'disable' subcommand
  sparse-checkout: add 'cone' mode
  sparse-checkout: use hashmaps for cone patterns
  sparse-checkout: init and add in cone mode

Jeff Hostetler (1):
  trace2:experiment: clear_ce_flags_1

 .gitignore                            |   1 +
 Documentation/config/core.txt         |   7 +-
 Documentation/git-clone.txt           |   8 +-
 Documentation/git-read-tree.txt       |   2 +-
 Documentation/git-sparse-checkout.txt | 146 ++++++++++
 Makefile                              |   1 +
 builtin.h                             |   1 +
 builtin/clone.c                       |  27 ++
 builtin/sparse-checkout.c             | 389 ++++++++++++++++++++++++++
 cache.h                               |   8 +-
 config.c                              |  10 +-
 dir.c                                 | 154 +++++++++-
 dir.h                                 |  27 ++
 environment.c                         |   2 +-
 git.c                                 |   1 +
 t/t1091-sparse-checkout-builtin.sh    | 195 +++++++++++++
 unpack-trees.c                        |  12 +-
 17 files changed, 976 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/git-sparse-checkout.txt
 create mode 100644 builtin/sparse-checkout.c
 create mode 100755 t/t1091-sparse-checkout-builtin.sh


base-commit: 5fa0f5238b0cd46cfe7f6fa76c3f526ea98148d9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-316%2Fderrickstolee%2Fsparse-checkout%2Fupstream-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-316/derrickstolee/sparse-checkout/upstream-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/316
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/9] sparse-checkout: create builtin with 'list' subcommand
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-23 22:30   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 2/9] sparse-checkout: create 'init' subcommand Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-checkout feature is mostly hidden to users, as its
only documentation is supplementary information in the docs for
'git read-tree'. In addition, users need to know how to edit the
.git/info/sparse-checkout file with the right patterns, then run
the appropriate 'git read-tree -mu HEAD' command. Keeping the
working directory in sync with the sparse-checkout file requires
care.

Begin an effort to make the sparse-checkout feature a porcelain
feature by creating a new 'git sparse-checkout' builtin. This
builtin will be the preferred mechanism for manipulating the
sparse-checkout file and syncing the working directory.

For now, create the basics of the builtin. Includes a single
subcommand, "git sparse-checkout list", that lists the patterns
currently in the sparse-checkout file. Test that these patterns
are parsed and written correctly to the output.

The documentation provided is adapted from the "git read-tree"
documentation with a few edits for clarity in the new context.
Extra sections are added to hint toward a future change to
a moer restricted pattern set.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                            |  1 +
 Documentation/git-read-tree.txt       |  2 +-
 Documentation/git-sparse-checkout.txt | 90 +++++++++++++++++++++++++++
 Makefile                              |  1 +
 builtin.h                             |  1 +
 builtin/sparse-checkout.c             | 90 +++++++++++++++++++++++++++
 git.c                                 |  1 +
 t/t1091-sparse-checkout-builtin.sh    | 50 +++++++++++++++
 8 files changed, 235 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/git-sparse-checkout.txt
 create mode 100644 builtin/sparse-checkout.c
 create mode 100755 t/t1091-sparse-checkout-builtin.sh

diff --git a/.gitignore b/.gitignore
index 521d8f4fb4..9411522c4f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -158,6 +158,7 @@
 /git-show-branch
 /git-show-index
 /git-show-ref
+/git-sparse-checkout
 /git-stage
 /git-stash
 /git-status
diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index d271842608..da33f84f33 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -436,7 +436,7 @@ support.
 SEE ALSO
 --------
 linkgit:git-write-tree[1]; linkgit:git-ls-files[1];
-linkgit:gitignore[5]
+linkgit:gitignore[5]; linkgit:git-sparse-checkout[1];
 
 GIT
 ---
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
new file mode 100644
index 0000000000..ca0ca6a12f
--- /dev/null
+++ b/Documentation/git-sparse-checkout.txt
@@ -0,0 +1,90 @@
+git-sparse-checkout(1)
+=======================
+
+NAME
+----
+git-sparse-checkout - Initialize and modify the sparse-checkout
+configuration, which reduces the checkout to a set of directories
+given by a list of prefixes.
+
+
+SYNOPSIS
+--------
+[verse]
+'git sparse-checkout <subcommand> [options]'
+
+
+DESCRIPTION
+-----------
+
+Initialize and modify the sparse-checkout configuration, which reduces
+the checkout to a set of directories given by a list of prefixes.
+
+
+COMMANDS
+--------
+'list'::
+	Provide a list of the contents in the sparse-checkout file.
+
+
+SPARSE CHECKOUT
+----------------
+
+"Sparse checkout" allows populating the working directory sparsely.
+It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
+Git whether a file in the working directory is worth looking at. If
+the skip-worktree bit is set, then the file is ignored in the working
+directory. Git will not populate the contents of those files, which
+makes a sparse checkout helpful when working in a repository with many
+files, but only a few are important to the current user.
+
+The `$GIT_DIR/info/sparse-checkout` file is used to define the
+skip-worktree reference bitmap. When Git updates the working
+directory, it resets the skip-worktree bit in the index based on this
+file. If an entry
+matches a pattern in this file, skip-worktree will not be set on
+that entry. Otherwise, skip-worktree will be set.
+
+Then it compares the new skip-worktree value with the previous one. If
+skip-worktree turns from set to unset, it will add the corresponding
+file back. If it turns from unset to set, that file will be removed.
+
+## FULL PATTERN SET
+
+By default, the sparse-checkout file uses the same syntax as `.gitignore`
+files.
+
+While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
+files are in, you can also specify what files are _not_ in, using
+negate patterns. For example, to remove the file `unwanted`:
+
+----------------
+/*
+!unwanted
+----------------
+
+Another tricky thing is fully repopulating the working directory when you
+no longer want sparse checkout. You cannot just disable "sparse
+checkout" because skip-worktree bits are still in the index and your working
+directory is still sparsely populated. You should re-populate the working
+directory with the `$GIT_DIR/info/sparse-checkout` file content as
+follows:
+
+----------------
+/*
+----------------
+
+Then you can disable sparse checkout. Sparse checkout support in 'git
+read-tree' and similar commands is disabled by default. You need to
+set `core.sparseCheckout` to `true` in order to have sparse checkout
+support.
+
+SEE ALSO
+--------
+
+linkgit:git-read-tree[1]
+linkgit:gitignore[5]
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index f9255344ae..d1d067ab65 100644
--- a/Makefile
+++ b/Makefile
@@ -1126,6 +1126,7 @@ BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-index.o
 BUILTIN_OBJS += builtin/show-ref.o
+BUILTIN_OBJS += builtin/sparse-checkout.o
 BUILTIN_OBJS += builtin/stash.o
 BUILTIN_OBJS += builtin/stripspace.o
 BUILTIN_OBJS += builtin/submodule--helper.o
diff --git a/builtin.h b/builtin.h
index 5cf5df69f7..2b25a80cde 100644
--- a/builtin.h
+++ b/builtin.h
@@ -225,6 +225,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix);
 int cmd_show(int argc, const char **argv, const char *prefix);
 int cmd_show_branch(int argc, const char **argv, const char *prefix);
 int cmd_show_index(int argc, const char **argv, const char *prefix);
+int cmd_sparse_checkout(int argc, const char **argv, const char *prefix);
 int cmd_status(int argc, const char **argv, const char *prefix);
 int cmd_stash(int argc, const char **argv, const char *prefix);
 int cmd_stripspace(int argc, const char **argv, const char *prefix);
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
new file mode 100644
index 0000000000..6477a6ed9c
--- /dev/null
+++ b/builtin/sparse-checkout.c
@@ -0,0 +1,90 @@
+#include "builtin.h"
+#include "config.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "pathspec.h"
+#include "repository.h"
+#include "run-command.h"
+#include "strbuf.h"
+
+static char const * const builtin_sparse_checkout_usage[] = {
+	N_("git sparse-checkout [list]"),
+	NULL
+};
+
+struct opts_sparse_checkout {
+	const char *subcommand;
+	int read_stdin;
+} opts;
+
+static char *get_sparse_checkout_filename(void)
+{
+	return git_pathdup("info/sparse-checkout");
+}
+
+static void write_excludes_to_file(FILE *fp, struct exclude_list *el)
+{
+	int i;
+
+	for (i = 0; i < el->nr; i++) {
+		struct exclude *x = el->excludes[i];
+
+		if (x->flags & EXC_FLAG_NEGATIVE)
+			fprintf(fp, "!");
+
+		fprintf(fp, "%s", x->pattern);
+
+		if (x->flags & EXC_FLAG_MUSTBEDIR)
+			fprintf(fp, "/");
+
+		fprintf(fp, "\n");
+	}
+}
+
+static int sparse_checkout_list(int argc, const char **argv)
+{
+	struct exclude_list el;
+	char *sparse_filename;
+	int res;
+
+	memset(&el, 0, sizeof(el));
+
+	sparse_filename = get_sparse_checkout_filename();
+	res = add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);
+	free(sparse_filename);
+
+	if (res < 0) {
+		warning(_("failed to parse sparse-checkout file; it may not exist"));
+		return 0;
+	}
+
+	write_excludes_to_file(stdout, &el);
+	clear_exclude_list(&el);
+
+	return 0;
+}
+
+int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
+{
+	static struct option builtin_sparse_checkout_options[] = {
+		OPT_END(),
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_sparse_checkout_usage,
+				   builtin_sparse_checkout_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix,
+			     builtin_sparse_checkout_options,
+			     builtin_sparse_checkout_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (argc > 0) {
+		if (!strcmp(argv[0], "list"))
+			return sparse_checkout_list(argc, argv);
+	}
+
+	usage_with_options(builtin_sparse_checkout_usage,
+			   builtin_sparse_checkout_options);
+}
diff --git a/git.c b/git.c
index c1ee7124ed..dfb680e15d 100644
--- a/git.c
+++ b/git.c
@@ -573,6 +573,7 @@ static struct cmd_struct commands[] = {
 	{ "show-branch", cmd_show_branch, RUN_SETUP },
 	{ "show-index", cmd_show_index },
 	{ "show-ref", cmd_show_ref, RUN_SETUP },
+	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP | NEED_WORK_TREE },
 	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },
 	/*
 	 * NEEDSWORK: Until the builtin stash is thoroughly robust and no
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
new file mode 100755
index 0000000000..ba6928c641
--- /dev/null
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+
+test_description='sparse checkout builtin tests'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	git init repo &&
+	(
+		cd repo &&
+		echo "initial" >a &&
+		mkdir folder1 folder2 deep &&
+		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1/deepest &&
+		cp a folder1 &&
+		cp a folder2 &&
+		cp a deep &&
+		cp a deep/deeper1 &&
+		cp a deep/deeper2 &&
+		cp a deep/deeper1/deepest &&
+		git add . &&
+		git commit -m "initial commit"
+	)
+'
+
+test_expect_success 'git sparse-checkout list (empty)' '
+	git -C repo sparse-checkout list >list 2>err &&
+	test_line_count = 0 list &&
+	test_i18ngrep "failed to parse sparse-checkout file; it may not exist" err
+'
+
+test_expect_success 'git sparse-checkout list (populated)' '
+	test_when_finished rm -f repo/.git/info/sparse-checkout &&
+	cat >repo/.git/info/sparse-checkout <<-EOF &&
+		/folder1/*
+		/deep/
+		**/a
+		!*bin*
+	EOF
+	git -C repo sparse-checkout list >list &&
+	cat >expect <<-EOF &&
+		/folder1/*
+		/deep/
+		**/a
+		!*bin*
+	EOF
+	test_cmp expect list
+'
+
+test_done
\ No newline at end of file
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/9] sparse-checkout: create 'init' subcommand
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
  2019-08-20 15:11 ` [PATCH 1/9] sparse-checkout: create builtin with 'list' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-23 23:02   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 3/9] clone: add --sparse mode Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Getting started with a sparse-checkout file can be daunting. Help
users start their sparse enlistment using 'git sparse-checkout init'.
This will set 'core.sparseCheckout=true' in their config, write
an initial set of patterns to the sparse-checkout file, and update
their working directory.

Using 'git read-tree' to clear directories does not work cleanly
on Windows, so manually delete directories that are tracked by Git
before running read-tree.

The use of running another process for 'git read-tree' is likely
suboptimal, but that can be improved in a later change, if valuable.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt |   7 ++
 builtin/sparse-checkout.c             | 106 +++++++++++++++++++++++++-
 t/t1091-sparse-checkout-builtin.sh    |  40 ++++++++++
 3 files changed, 152 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index ca0ca6a12f..50c53ee60a 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -26,6 +26,13 @@ COMMANDS
 'list'::
 	Provide a list of the contents in the sparse-checkout file.
 
+'init'::
+	Enable the `core.sparseCheckout` setting. If the
+	sparse-checkout file does not exist, then populate it with
+	patterns that match every file in the root directory and
+	no other directories, then will remove all directories tracked
+	by Git. Add patterns to the sparse-checkout file to
+	repopulate the working directory.
 
 SPARSE CHECKOUT
 ----------------
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 6477a6ed9c..86d24e6295 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 static char const * const builtin_sparse_checkout_usage[] = {
-	N_("git sparse-checkout [list]"),
+	N_("git sparse-checkout [init|list]"),
 	NULL
 };
 
@@ -64,6 +64,108 @@ static int sparse_checkout_list(int argc, const char **argv)
 	return 0;
 }
 
+static int sc_read_tree(void)
+{
+	struct argv_array argv = ARGV_ARRAY_INIT;
+	int result = 0;
+	argv_array_pushl(&argv, "read-tree", "-m", "-u", "HEAD", NULL);
+
+	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
+		error(_("failed to update index with new sparse-checkout paths"));
+		result = 1;
+	}
+
+	argv_array_clear(&argv);
+	return result;
+}
+
+static int sc_enable_config(void)
+{
+	struct argv_array argv = ARGV_ARRAY_INIT;
+	int result = 0;
+	argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", "true", NULL);
+
+	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
+		error(_("failed to enable core.sparseCheckout"));
+		result = 1;
+	}
+
+	argv_array_clear(&argv);
+	return result;
+}
+
+static int delete_directory(const struct object_id *oid, struct strbuf *base,
+		const char *pathname, unsigned mode, int stage, void *context)
+{
+	struct strbuf dirname = STRBUF_INIT;
+	struct stat sb;
+
+	strbuf_addstr(&dirname, the_repository->worktree);
+	strbuf_addch(&dirname, '/');
+	strbuf_addstr(&dirname, pathname);
+
+	if (stat(dirname.buf, &sb) || !(sb.st_mode & S_IFDIR))
+		return 0;
+
+	if (remove_dir_recursively(&dirname, 0))
+		warning(_("failed to remove directory '%s'"),
+			dirname.buf);
+
+	strbuf_release(&dirname);
+	return 0;
+}
+
+static int sparse_checkout_init(int argc, const char **argv)
+{
+	struct tree *t;
+	struct object_id oid;
+	struct exclude_list el;
+	static struct pathspec pathspec;
+	char *sparse_filename;
+	FILE *fp;
+	int res;
+
+	if (sc_enable_config())
+		return 1;
+
+	memset(&el, 0, sizeof(el));
+
+	sparse_filename = get_sparse_checkout_filename();
+	res = add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);
+
+	/* If we already have a sparse-checkout file, use it. */
+	if (res >= 0) {
+		free(sparse_filename);
+		goto reset_dir;
+	}
+
+	/* initial mode: all blobs at root */
+	fp = fopen(sparse_filename, "w");
+	free(sparse_filename);
+	fprintf(fp, "/*\n!/*/*\n");
+	fclose(fp);
+
+	/* remove all directories in the root, if tracked by Git */
+	if (get_oid("HEAD", &oid)) {
+		/* assume we are in a fresh repo */
+		return 0;
+	}
+
+	t = parse_tree_indirect(&oid);
+
+	parse_pathspec(&pathspec, PATHSPEC_ALL_MAGIC &
+				  ~(PATHSPEC_FROMTOP | PATHSPEC_LITERAL),
+		       PATHSPEC_PREFER_CWD,
+		       "", NULL);
+
+	if (read_tree_recursive(the_repository, t, "", 0, 0, &pathspec,
+				delete_directory, NULL))
+		return 1;
+
+reset_dir:
+	return sc_read_tree();
+}
+
 int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 {
 	static struct option builtin_sparse_checkout_options[] = {
@@ -83,6 +185,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 	if (argc > 0) {
 		if (!strcmp(argv[0], "list"))
 			return sparse_checkout_list(argc, argv);
+		if (!strcmp(argv[0], "init"))
+			return sparse_checkout_init(argc, argv);
 	}
 
 	usage_with_options(builtin_sparse_checkout_usage,
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index ba6928c641..35ab84aabd 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -47,4 +47,44 @@ test_expect_success 'git sparse-checkout list (populated)' '
 	test_cmp expect list
 '
 
+test_expect_success 'git sparse-checkout init' '
+	git -C repo sparse-checkout init &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/*
+	EOF
+	test_cmp expect repo/.git/info/sparse-checkout &&
+	git -C repo config --list >config &&
+	test_i18ngrep "core.sparsecheckout=true" config &&
+	ls repo >dir  &&
+	echo a >expect &&
+	test_cmp expect dir
+'
+
+test_expect_success 'git sparse-checkout list after init' '
+	git -C repo sparse-checkout list >actual &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/*
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'init with existing sparse-checkout' '
+	echo "/folder1/*" >> repo/.git/info/sparse-checkout &&
+	git -C repo sparse-checkout init &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/*
+		/folder1/*
+	EOF
+	test_cmp expect repo/.git/info/sparse-checkout &&
+	ls repo >dir  &&
+	cat >expect <<-EOF &&
+		a
+		folder1
+	EOF
+	test_cmp expect dir
+'
+
 test_done
\ No newline at end of file
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/9] clone: add --sparse mode
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
  2019-08-20 15:11 ` [PATCH 1/9] sparse-checkout: create builtin with 'list' subcommand Derrick Stolee via GitGitGadget
  2019-08-20 15:11 ` [PATCH 2/9] sparse-checkout: create 'init' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-23 23:17   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 4/9] sparse-checkout: 'add' subcommand Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When someone wants to clone a large repository, but plans to work
using a sparse-checkout file, they either need to do a full
checkout first and then reduce the patterns they included, or
clone with --no-checkout, set up their patterns, and then run
a checkout manually. This requires knowing a lot about the repo
shape and how sparse-checkout works.

Add a new '--sparse' option to 'git clone' that initializes the
sparse-checkout file to include the following patterns:

	/*
	!/*/*

These patterns include every file in the root directory, but
no directories. This allows a repo to include files like a
README or a bootstrapping script to grow enlistments from that
point.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-clone.txt        |  8 +++++++-
 builtin/clone.c                    | 27 +++++++++++++++++++++++++++
 t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 34011c2940..0fe91d2f04 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -15,7 +15,7 @@ SYNOPSIS
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch] [--no-tags]
 	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
-	  [--[no-]remote-submodules] [--jobs <n>] [--] <repository>
+	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -156,6 +156,12 @@ objects from the source repository into a pack in the cloned repository.
 	used, neither remote-tracking branches nor the related
 	configuration variables are created.
 
+--sparse::
+	Initialize the sparse-checkout file so the working
+	directory starts with only the files in the root
+	of the repository. The sparse-checkout file can be
+	modified to grow the working directory as needed.
+
 --mirror::
 	Set up a mirror of the source repository.  This implies `--bare`.
 	Compared to `--bare`, `--mirror` not only maps local branches of the
diff --git a/builtin/clone.c b/builtin/clone.c
index f665b28ccc..d6d49a73ff 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -60,6 +60,7 @@ static const char *real_git_dir;
 static char *option_upload_pack = "git-upload-pack";
 static int option_verbosity;
 static int option_progress = -1;
+static int option_sparse_checkout;
 static enum transport_family family;
 static struct string_list option_config = STRING_LIST_INIT_NODUP;
 static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
@@ -147,6 +148,8 @@ static struct option builtin_clone_options[] = {
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 	OPT_BOOL(0, "remote-submodules", &option_remote_submodules,
 		    N_("any cloned submodules will use their remote-tracking branch")),
+	OPT_BOOL(0, "sparse", &option_sparse_checkout,
+		    N_("initialize sparse-checkout file to include only files at root")),
 	OPT_END()
 };
 
@@ -734,6 +737,27 @@ static void update_head(const struct ref *our, const struct ref *remote,
 	}
 }
 
+static int git_sparse_checkout_init(const char *repo)
+{
+	struct argv_array argv = ARGV_ARRAY_INIT;
+	int result = 0;
+	argv_array_pushl(&argv, "-C", repo, "sparse-checkout", "init", NULL);
+
+	/*
+	 * We must apply the setting in the current process
+	 * for the later checkout to use the sparse-checkout file.
+	 */
+	core_apply_sparse_checkout = 1;
+
+	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
+		error(_("failed to initialize sparse-checkout"));
+		result = 1;
+	}
+
+	argv_array_clear(&argv);
+	return result;
+}
+
 static int checkout(int submodule_progress)
 {
 	struct object_id oid;
@@ -1107,6 +1131,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
+	if (option_sparse_checkout && git_sparse_checkout_init(repo))
+		return 1;
+
 	remote = remote_get(option_origin);
 
 	strbuf_addf(&default_refspec, "+%s*:%s*", src_ref_prefix,
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 35ab84aabd..b7d5f15830 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -87,4 +87,17 @@ test_expect_success 'init with existing sparse-checkout' '
 	test_cmp expect dir
 '
 
+test_expect_success 'clone --sparse' '
+	git clone --sparse repo clone &&
+	git -C clone sparse-checkout list >actual &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/*
+	EOF
+	test_cmp expect actual &&
+	ls clone >dir &&
+	echo a >expect &&
+	test_cmp expect dir
+'
+
 test_done
\ No newline at end of file
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/9] sparse-checkout: 'add' subcommand
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2019-08-20 15:11 ` [PATCH 3/9] clone: add --sparse mode Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-23 23:30   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 6/9] trace2:experiment: clear_ce_flags_1 Jeff Hostetler via GitGitGadget
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git sparse-checkout add' subcommand takes a list of patterns
over stdin and writes them to the sparse-checkout file. Then, it
updates the working directory using 'git read-tree -mu HEAD'.

Note: if a user adds a negative pattern that would lead to the
removal of a non-empty directory, then Git may not delete that
directory (on Windows).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt |  4 ++++
 builtin/sparse-checkout.c             | 32 ++++++++++++++++++++++++++-
 t/t1091-sparse-checkout-builtin.sh    | 20 +++++++++++++++++
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 50c53ee60a..6f540a3443 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -34,6 +34,10 @@ COMMANDS
 	by Git. Add patterns to the sparse-checkout file to
 	repopulate the working directory.
 
+'add'::
+	Add a set of patterns to the sparse-checkout file, as given over
+	stdin. Updates the working directory to match the new patterns.
+
 SPARSE CHECKOUT
 ----------------
 
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 86d24e6295..ec6134fecc 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 static char const * const builtin_sparse_checkout_usage[] = {
-	N_("git sparse-checkout [init|list]"),
+	N_("git sparse-checkout [init|add|list]"),
 	NULL
 };
 
@@ -166,6 +166,34 @@ static int sparse_checkout_init(int argc, const char **argv)
 	return sc_read_tree();
 }
 
+static int sparse_checkout_add(int argc, const char **argv)
+{
+	struct exclude_list el;
+	char *sparse_filename;
+	FILE *fp;
+	struct strbuf line = STRBUF_INIT;
+
+	memset(&el, 0, sizeof(el));
+
+	sparse_filename = get_sparse_checkout_filename();
+	add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);
+
+	fp = fopen(sparse_filename, "w");
+	write_excludes_to_file(fp, &el);
+
+	while (!strbuf_getline(&line, stdin)) {
+		strbuf_trim(&line);
+		fprintf(fp, "%s\n", line.buf);
+	}
+
+	fclose(fp);
+	free(sparse_filename);
+
+	clear_exclude_list(&el);
+
+	return sc_read_tree();
+}
+
 int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 {
 	static struct option builtin_sparse_checkout_options[] = {
@@ -187,6 +215,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 			return sparse_checkout_list(argc, argv);
 		if (!strcmp(argv[0], "init"))
 			return sparse_checkout_init(argc, argv);
+		if (!strcmp(argv[0], "add"))
+			return sparse_checkout_add(argc, argv);
 	}
 
 	usage_with_options(builtin_sparse_checkout_usage,
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index b7d5f15830..499bd8d6d0 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -100,4 +100,24 @@ test_expect_success 'clone --sparse' '
 	test_cmp expect dir
 '
 
+test_expect_success 'add to existing sparse-checkout' '
+	echo "/folder2/*" | git -C repo sparse-checkout add &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/*
+		/folder1/*
+		/folder2/*
+	EOF
+	git -C repo sparse-checkout list >actual &&
+	test_cmp expect actual &&
+	test_cmp expect repo/.git/info/sparse-checkout &&
+	ls repo >dir  &&
+	cat >expect <<-EOF &&
+		a
+		folder1
+		folder2
+	EOF
+	test_cmp expect dir
+'
+
 test_done
\ No newline at end of file
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 6/9] trace2:experiment: clear_ce_flags_1
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2019-08-20 15:11 ` [PATCH 4/9] sparse-checkout: 'add' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Jeff Hostetler via GitGitGadget
  2019-08-24  0:08   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 5/9] sparse-checkout: create 'disable' subcommand Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

The clear_ce_flags_1 method is used by many types of calls to
unpack_trees(). Add trace2 regions around the method, including
some flag information, so we can get granular performance data
during experiments.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 unpack-trees.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 62276d4fef..8c3b5e8849 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1397,15 +1397,23 @@ static int clear_ce_flags(struct index_state *istate,
 			  struct exclude_list *el)
 {
 	static struct strbuf prefix = STRBUF_INIT;
+	char label[100];
+	int rval;
 
 	strbuf_reset(&prefix);
 
-	return clear_ce_flags_1(istate,
+	xsnprintf(label, sizeof(label), "clear_ce_flags(0x%08lx,0x%08lx)",
+		  (unsigned long)select_mask, (unsigned long)clear_mask);
+	trace2_region_enter("exp", label, the_repository);
+	rval = clear_ce_flags_1(istate,
 				istate->cache,
 				istate->cache_nr,
 				&prefix,
 				select_mask, clear_mask,
 				el, 0);
+	trace2_region_leave("exp", label, the_repository);
+
+	return rval;
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 5/9] sparse-checkout: create 'disable' subcommand
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2019-08-20 15:11 ` [PATCH 6/9] trace2:experiment: clear_ce_flags_1 Jeff Hostetler via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-23 23:50   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 7/9] sparse-checkout: add 'cone' mode Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The instructions for disabling a sparse-checkout to a full
working directory are complicated and non-intuitive. Add a
subcommand, 'git sparse-checkout disable', to perform those
steps for the user.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt | 26 +++++++---------
 builtin/sparse-checkout.c             | 45 ++++++++++++++++++++++++---
 t/t1091-sparse-checkout-builtin.sh    | 15 +++++++++
 3 files changed, 67 insertions(+), 19 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 6f540a3443..de04b768ae 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -38,6 +38,10 @@ COMMANDS
 	Add a set of patterns to the sparse-checkout file, as given over
 	stdin. Updates the working directory to match the new patterns.
 
+'disable'::
+	Remove the sparse-checkout file, set `core.sparseCheckout` to
+	`false`, and restore the working directory to include all files.
+
 SPARSE CHECKOUT
 ----------------
 
@@ -60,6 +64,13 @@ Then it compares the new skip-worktree value with the previous one. If
 skip-worktree turns from set to unset, it will add the corresponding
 file back. If it turns from unset to set, that file will be removed.
 
+To repopulate the working directory with all files, use the
+`git sparse-checkout disable` command.
+
+Sparse checkout support in 'git read-tree' and similar commands is
+disabled by default. You need to set `core.sparseCheckout` to `true`
+in order to have sparse checkout support.
+
 ## FULL PATTERN SET
 
 By default, the sparse-checkout file uses the same syntax as `.gitignore`
@@ -74,21 +85,6 @@ negate patterns. For example, to remove the file `unwanted`:
 !unwanted
 ----------------
 
-Another tricky thing is fully repopulating the working directory when you
-no longer want sparse checkout. You cannot just disable "sparse
-checkout" because skip-worktree bits are still in the index and your working
-directory is still sparsely populated. You should re-populate the working
-directory with the `$GIT_DIR/info/sparse-checkout` file content as
-follows:
-
-----------------
-/*
-----------------
-
-Then you can disable sparse checkout. Sparse checkout support in 'git
-read-tree' and similar commands is disabled by default. You need to
-set `core.sparseCheckout` to `true` in order to have sparse checkout
-support.
 
 SEE ALSO
 --------
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index ec6134fecc..8f97c27ec7 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 static char const * const builtin_sparse_checkout_usage[] = {
-	N_("git sparse-checkout [init|add|list]"),
+	N_("git sparse-checkout [init|add|list|disable]"),
 	NULL
 };
 
@@ -79,11 +79,24 @@ static int sc_read_tree(void)
 	return result;
 }
 
-static int sc_enable_config(void)
+static int sc_set_config(int mode)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	int result = 0;
-	argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", "true", NULL);
+	argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", NULL);
+
+	switch (mode) {
+	case 1:
+		argv_array_pushl(&argv, "true", NULL);
+		break;
+
+	case 0:
+		argv_array_pushl(&argv, "false", NULL);
+		break;
+
+	default:
+		die(_("invalid config mode"));
+	}
 
 	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
 		error(_("failed to enable core.sparseCheckout"));
@@ -125,7 +138,7 @@ static int sparse_checkout_init(int argc, const char **argv)
 	FILE *fp;
 	int res;
 
-	if (sc_enable_config())
+	if (sc_set_config(1))
 		return 1;
 
 	memset(&el, 0, sizeof(el));
@@ -194,6 +207,28 @@ static int sparse_checkout_add(int argc, const char **argv)
 	return sc_read_tree();
 }
 
+static int sparse_checkout_disable(int argc, const char **argv)
+{
+	char *sparse_filename;
+	FILE *fp;
+
+	if (sc_set_config(1))
+		die(_("failed to change config"));
+
+	sparse_filename = get_sparse_checkout_filename();
+	fp = fopen(sparse_filename, "w");
+	fprintf(fp, "/*\n");
+	fclose(fp);
+
+	if (sc_read_tree())
+		die(_("error while refreshing working directory"));
+
+	unlink(sparse_filename);
+	free(sparse_filename);
+
+	return sc_set_config(0);
+}
+
 int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 {
 	static struct option builtin_sparse_checkout_options[] = {
@@ -217,6 +252,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 			return sparse_checkout_init(argc, argv);
 		if (!strcmp(argv[0], "add"))
 			return sparse_checkout_add(argc, argv);
+		if (!strcmp(argv[0], "disable"))
+			return sparse_checkout_disable(argc, argv);
 	}
 
 	usage_with_options(builtin_sparse_checkout_usage,
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 499bd8d6d0..68ca63a6f6 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -120,4 +120,19 @@ test_expect_success 'add to existing sparse-checkout' '
 	test_cmp expect dir
 '
 
+test_expect_success 'sparse-checkout disable' '
+	git -C repo sparse-checkout disable &&
+	test_path_is_missing repo/.git/info/sparse-checkout &&
+	git -C repo config --list >config &&
+	test_i18ngrep "core.sparsecheckout=false" config &&
+	ls repo >dir &&
+	cat >expect <<-EOF &&
+		a
+		deep
+		folder1
+		folder2
+	EOF
+	test_cmp expect dir
+'
+
 test_done
\ No newline at end of file
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 7/9] sparse-checkout: add 'cone' mode
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2019-08-20 15:11 ` [PATCH 5/9] sparse-checkout: create 'disable' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-24  0:31   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 8/9] sparse-checkout: use hashmaps for cone patterns Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The sparse-checkout feature can have quadratic performance as
the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can
be very significant.

Create a new 'cone' mode for the core.sparseCheckout config
option, and adjust the parser to set an appropriate enum value.

While adjusting the type of this variable, rename it from
core_apply_sparse_checkout to core_sparse_checkout. This will
help avoid parallel changes from hitting type issues, and we
can guarantee that all uses now consider the enum values instead
of the int value.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/core.txt         |  7 ++--
 Documentation/git-sparse-checkout.txt | 50 +++++++++++++++++++++++++++
 builtin/clone.c                       |  2 +-
 builtin/sparse-checkout.c             | 16 +++++----
 cache.h                               |  8 ++++-
 config.c                              | 10 +++++-
 environment.c                         |  2 +-
 t/t1091-sparse-checkout-builtin.sh    | 14 ++++++++
 unpack-trees.c                        |  2 +-
 9 files changed, 98 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 75538d27e7..9b8ab2a6d4 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -591,8 +591,11 @@ core.multiPackIndex::
 	multi-pack-index design document].
 
 core.sparseCheckout::
-	Enable "sparse checkout" feature. See section "Sparse checkout" in
-	linkgit:git-read-tree[1] for more information.
+	Enable "sparse checkout" feature. If "false", then sparse-checkout
+	is disabled. If "true", then sparse-checkout is enabled with the full
+	.gitignore pattern set. If "cone", then sparse-checkout is enabled with
+	a restricted pattern set. See linkgit:git-sparse-checkout[1] for more
+	information.
 
 core.abbrev::
 	Set the length object names are abbreviated to.  If
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index de04b768ae..463319055b 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -86,6 +86,56 @@ negate patterns. For example, to remove the file `unwanted`:
 ----------------
 
 
+## CONE PATTERN SET
+
+The full pattern set allows for arbitrary pattern matches and complicated
+inclusion/exclusion rules. These can result in O(N*M) pattern matches when
+updating the index, where N is the number of patterns and M is the number
+of paths in the index. To combat this performance issue, a more restricted
+pattern set is allowed when `core.spareCheckout` is set to `cone`.
+
+The accepted patterns in the cone pattern set are:
+
+1. *Recursive:* All paths inside a directory are included.
+
+2. *Parent:* All files immediately inside a directory are included.
+
+In addition to the above two patterns, we also expect that all files in the
+root directory are included. If a recursive pattern is added, then all
+leading directories are added as parent patterns.
+
+By default, when running `git sparse-checkout init`, the root directory is
+added as a parent pattern. At this point, the sparse-checkout file contains
+the following patterns:
+
+```
+/*
+!/*/*
+```
+
+This says "include everything in root, but nothing two levels below root."
+If we then add the folder `A/B/C` as a recursive pattern, the folders `A` and
+`A/B` are added as parent patterns. The resulting sparse-checkout file is
+now
+
+```
+/*
+!/*/*
+/A/*
+!/A/*/*
+/A/B/*
+!/A/B/*/*
+/A/B/C/*
+```
+
+Here, order matters, so the negative patterns are overridden by the positive
+patterns that appear lower in the file.
+
+If `core.sparseCheckout=cone`, then Git will parse the sparse-checkout file
+expecting patterns of these types. Git will warn if the patterns do not match.
+If the patterns do match the expected format, then Git will use faster hash-
+based algorithms to compute inclusion in the sparse-checkout.
+
 SEE ALSO
 --------
 
diff --git a/builtin/clone.c b/builtin/clone.c
index d6d49a73ff..763898ada5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,7 +747,7 @@ static int git_sparse_checkout_init(const char *repo)
 	 * We must apply the setting in the current process
 	 * for the later checkout to use the sparse-checkout file.
 	 */
-	core_apply_sparse_checkout = 1;
+	core_sparse_checkout = SPARSE_CHECKOUT_FULL;
 
 	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
 		error(_("failed to initialize sparse-checkout"));
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 8f97c27ec7..77e5235720 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -79,18 +79,22 @@ static int sc_read_tree(void)
 	return result;
 }
 
-static int sc_set_config(int mode)
+static int sc_set_config(enum sparse_checkout_mode mode)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	int result = 0;
 	argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", NULL);
 
 	switch (mode) {
-	case 1:
+	case SPARSE_CHECKOUT_FULL:
 		argv_array_pushl(&argv, "true", NULL);
 		break;
 
-	case 0:
+	case SPARSE_CHECKOUT_CONE:
+		argv_array_pushl(&argv, "cone", NULL);
+		break;
+
+	case SPARSE_CHECKOUT_NONE:
 		argv_array_pushl(&argv, "false", NULL);
 		break;
 
@@ -138,7 +142,7 @@ static int sparse_checkout_init(int argc, const char **argv)
 	FILE *fp;
 	int res;
 
-	if (sc_set_config(1))
+	if (sc_set_config(SPARSE_CHECKOUT_FULL))
 		return 1;
 
 	memset(&el, 0, sizeof(el));
@@ -212,7 +216,7 @@ static int sparse_checkout_disable(int argc, const char **argv)
 	char *sparse_filename;
 	FILE *fp;
 
-	if (sc_set_config(1))
+	if (sc_set_config(SPARSE_CHECKOUT_FULL))
 		die(_("failed to change config"));
 
 	sparse_filename = get_sparse_checkout_filename();
@@ -226,7 +230,7 @@ static int sparse_checkout_disable(int argc, const char **argv)
 	unlink(sparse_filename);
 	free(sparse_filename);
 
-	return sc_set_config(0);
+	return sc_set_config(SPARSE_CHECKOUT_NONE);
 }
 
 int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
diff --git a/cache.h b/cache.h
index b1da1ab08f..4426816ca1 100644
--- a/cache.h
+++ b/cache.h
@@ -865,12 +865,18 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
-extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern const char *core_fsmonitor;
 
+enum sparse_checkout_mode {
+	SPARSE_CHECKOUT_NONE = 0,
+	SPARSE_CHECKOUT_FULL = 1,
+	SPARSE_CHECKOUT_CONE = 2,
+};
+enum sparse_checkout_mode core_sparse_checkout;
+
 /*
  * Include broken refs in all ref iterations, which will
  * generally choke dangerous operations rather than letting
diff --git a/config.c b/config.c
index 3900e4947b..15b7a20dd9 100644
--- a/config.c
+++ b/config.c
@@ -1360,7 +1360,15 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.sparsecheckout")) {
-		core_apply_sparse_checkout = git_config_bool(var, value);
+		int result = git_parse_maybe_bool(value);
+
+		if (result < 0) {
+			core_sparse_checkout = SPARSE_CHECKOUT_NONE;
+
+			if (!strcasecmp(value, "cone"))
+				core_sparse_checkout = SPARSE_CHECKOUT_CONE;
+		} else
+			core_sparse_checkout = result;
 		return 0;
 	}
 
diff --git a/environment.c b/environment.c
index 89af47cb85..cc12e30bd6 100644
--- a/environment.c
+++ b/environment.c
@@ -68,7 +68,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
-int core_apply_sparse_checkout;
+enum sparse_checkout_mode core_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 68ca63a6f6..8cc377b839 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -120,6 +120,20 @@ test_expect_success 'add to existing sparse-checkout' '
 	test_cmp expect dir
 '
 
+test_expect_success 'cone mode: match patterns' '
+	git -C repo config --replace-all core.sparseCheckout cone &&
+	rm -rf repo/a repo/folder1 repo/folder2 &&
+	git -C repo read-tree -mu HEAD &&
+	git -C repo reset --hard &&
+	ls repo >dir  &&
+	cat >expect <<-EOF &&
+		a
+		folder1
+		folder2
+	EOF
+	test_cmp expect dir
+'
+
 test_expect_success 'sparse-checkout disable' '
 	git -C repo sparse-checkout disable &&
 	test_path_is_missing repo/.git/info/sparse-checkout &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 8c3b5e8849..289c62305f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1468,7 +1468,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	trace_performance_enter();
 	memset(&el, 0, sizeof(el));
-	if (!core_apply_sparse_checkout || !o->update)
+	if (!core_sparse_checkout || !o->update)
 		o->skip_sparse_checkout = 1;
 	if (!o->skip_sparse_checkout) {
 		char *sparse = git_pathdup("info/sparse-checkout");
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 8/9] sparse-checkout: use hashmaps for cone patterns
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2019-08-20 15:11 ` [PATCH 7/9] sparse-checkout: add 'cone' mode Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-24  4:56   ` Elijah Newren
  2019-08-20 15:11 ` [PATCH 9/9] sparse-checkout: init and add in cone mode Derrick Stolee via GitGitGadget
  2019-08-21 21:52 ` [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Elijah Newren
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The parent and recursive patterns allowed by the "cone mode"
option in sparse-checkout are restrictive enough that we
can avoid using the regex parsing. Everything is based on
prefix matches, so we can use hashsets to store the prefixes
from the sparse-checkout file. When checking a path, we can
strip path entries from the path and check the hashset for
an exact match.

As a test, I created a cone-mode sparse-checkout file for the
Linux repository that actually includes every file. This was
constructed by taking every folder in the Linux repo and creating
the pattern pairs here:

	/$folder/*
	!/$folder/*/*

This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following
performance:

	core.sparseCheckout=false: 0.21 s (0.00 s)
	 core.sparseCheckout=true: 3.75 s (3.50 s)
	 core.sparseCheckout=cone: 0.23 s (0.01 s)

The times in parentheses above correspond to the time spent
in the first clear_ce_flags() call, according to the trace2
performance traces.

While this example is contrived, it demonstrates how these
patterns can slow the sparse-checkout feature.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt |   1 -
 dir.c                                 | 154 +++++++++++++++++++++++++-
 dir.h                                 |  27 +++++
 t/t1091-sparse-checkout-builtin.sh    |   8 ++
 4 files changed, 183 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 463319055b..7ade827370 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -85,7 +85,6 @@ negate patterns. For example, to remove the file `unwanted`:
 !unwanted
 ----------------
 
-
 ## CONE PATTERN SET
 
 The full pattern set allows for arbitrary pattern matches and complicated
diff --git a/dir.c b/dir.c
index d021c908e5..2c5ff89a72 100644
--- a/dir.c
+++ b/dir.c
@@ -599,6 +599,99 @@ void parse_exclude_pattern(const char **pattern,
 	*patternlen = len;
 }
 
+static int el_hashmap_cmp(const void *unused_cmp_data,
+			  const void *a, const void *b, const void *key)
+{
+	const struct exclude_entry *ee1 = a;
+	const struct exclude_entry *ee2 = b;
+
+	return strncmp(ee1->pattern, ee2->pattern, ee1->patternlen);
+}
+
+static void add_exclude_to_hashsets(struct exclude_list *el, struct exclude *x)
+{
+	struct exclude_entry *e;
+	char *truncated;
+	char *data = NULL;
+
+	if (!el->use_cone_patterns)
+		return;
+
+	if (x->patternlen >= 4 &&
+	    !strcmp(x->pattern + x->patternlen - 4, "/*/*")) {
+		if (!(x->flags & EXC_FLAG_NEGATIVE)) {
+			/* Not a cone pattern. */
+			el->use_cone_patterns = 0;
+			warning(_("unrecognized pattern: '%s'"), x->pattern);
+			goto clear_hashmaps;
+		}
+
+		truncated = xstrdup(x->pattern);
+		truncated[x->patternlen - 4] = 0;
+
+		e = xmalloc(sizeof(struct exclude_entry));
+		e->pattern = truncated;
+		e->patternlen = x->patternlen - 4;
+		hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
+
+		if (!hashmap_get(&el->recursive_hashmap, e, NULL)) {
+			/* We did not see the "parent" included */
+			warning(_("unrecognized negative pattern: '%s'"), x->pattern);
+			free(truncated);
+			goto clear_hashmaps;
+		}
+
+		hashmap_add(&el->parent_hashmap, e);
+		hashmap_remove(&el->recursive_hashmap, e, &data);
+		free(data);
+		return;
+	}
+
+	if (x->patternlen >= 2 &&
+	    !strcmp(x->pattern + x->patternlen - 2, "/*")) {
+		if (x->flags & EXC_FLAG_NEGATIVE) {
+			warning(_("unrecognized negative pattern: '%s'"), x->pattern);
+			goto clear_hashmaps;
+		}
+
+		e = xmalloc(sizeof(struct exclude_entry));
+
+		truncated = xstrdup(x->pattern);
+		truncated[x->patternlen - 2] = 0;
+		e->pattern = truncated;
+		e->patternlen = x->patternlen - 2;
+		hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
+
+		hashmap_add(&el->recursive_hashmap, e);
+
+		if (hashmap_get(&el->parent_hashmap, e, NULL)) {
+			/* we already included this at the parent level */
+			warning(_("your sparse-checkout file may have issues: pattern '%s' is repeated"),
+				x->pattern);
+			hashmap_remove(&el->parent_hashmap, e, &data);
+			free(data);
+		}
+		return;
+	}
+
+clear_hashmaps:
+	hashmap_free(&el->parent_hashmap, 1);
+	hashmap_free(&el->recursive_hashmap, 1);
+	el->use_cone_patterns = 0;
+}
+
+static int hashmap_contains_path(struct hashmap *map,
+				 struct strbuf *pattern)
+{
+	struct exclude_entry e;
+
+	/* Check straight mapping */
+	e.pattern = pattern->buf;
+	e.patternlen = pattern->len;
+	hashmap_entry_init(&e, memhash(e.pattern, e.patternlen));
+	return !!hashmap_get(map, &e, NULL);
+}
+
 void add_exclude(const char *string, const char *base,
 		 int baselen, struct exclude_list *el, int srcpos)
 {
@@ -623,6 +716,8 @@ void add_exclude(const char *string, const char *base,
 	ALLOC_GROW(el->excludes, el->nr + 1, el->alloc);
 	el->excludes[el->nr++] = x;
 	x->el = el;
+
+	add_exclude_to_hashsets(el, x);
 }
 
 static int read_skip_worktree_file_from_index(const struct index_state *istate,
@@ -848,6 +943,10 @@ static int add_excludes_from_buffer(char *buf, size_t size,
 	int i, lineno = 1;
 	char *entry;
 
+	el->use_cone_patterns = core_sparse_checkout == SPARSE_CHECKOUT_CONE ? 1 : 0;
+	hashmap_init(&el->recursive_hashmap, el_hashmap_cmp, NULL, 0);
+	hashmap_init(&el->parent_hashmap, el_hashmap_cmp, NULL, 0);
+
 	el->filebuf = buf;
 
 	if (skip_utf8_bom(&buf, size))
@@ -1070,18 +1169,61 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 
 /*
  * Scan the list and let the last match determine the fate.
- * Return 1 for exclude, 0 for include and -1 for undecided.
+ * Return 0 for exclude, 1 for include and -1 for undecided.
  */
 int is_excluded_from_list(const char *pathname,
 			  int pathlen, const char *basename, int *dtype,
 			  struct exclude_list *el, struct index_state *istate)
 {
 	struct exclude *exclude;
-	exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
-						  dtype, el, istate);
-	if (exclude)
-		return exclude->flags & EXC_FLAG_NEGATIVE ? 0 : 1;
-	return -1; /* undecided */
+	struct strbuf parent_pathname = STRBUF_INIT;
+	int result = 0;
+	const char *slash_pos;
+
+	if (!el->use_cone_patterns) {
+		exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
+								dtype, el, istate);
+
+		if (exclude)
+			return exclude->flags & EXC_FLAG_NEGATIVE ? 0 : 1;
+
+		return -1; /* undecided */
+	}
+
+	strbuf_addch(&parent_pathname, '/');
+	strbuf_add(&parent_pathname, pathname, pathlen);
+	slash_pos = strrchr(parent_pathname.buf, '/');
+
+	if (slash_pos == parent_pathname.buf) {
+		/* include every file in root */
+		result = 1;
+		goto done;
+	}
+
+	strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
+
+	if (hashmap_contains_path(&el->parent_hashmap, &parent_pathname)) {
+		result = 1;
+		goto done;
+	}
+
+	while (parent_pathname.len) {
+		if (hashmap_contains_path(&el->recursive_hashmap,
+					  &parent_pathname)) {
+			result = -1;
+			goto done;
+		}
+
+		slash_pos = strrchr(parent_pathname.buf, '/');
+		if (slash_pos == parent_pathname.buf)
+			break;
+
+		strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
+	}
+
+done:
+	strbuf_release(&parent_pathname);
+	return result;
 }
 
 static struct exclude *last_exclude_matching_from_lists(struct dir_struct *dir,
diff --git a/dir.h b/dir.h
index 680079bbe3..2d3356d1c0 100644
--- a/dir.h
+++ b/dir.h
@@ -4,6 +4,7 @@
 /* See Documentation/technical/api-directory-listing.txt */
 
 #include "cache.h"
+#include "hashmap.h"
 #include "strbuf.h"
 
 struct dir_entry {
@@ -37,6 +38,13 @@ struct exclude {
 	int srcpos;
 };
 
+/* used for hashmaps for cone patterns */
+struct exclude_entry {
+	struct hashmap_entry ent;
+	char *pattern;
+	size_t patternlen;
+};
+
 /*
  * Each excludes file will be parsed into a fresh exclude_list which
  * is appended to the relevant exclude_list_group (either EXC_DIRS or
@@ -55,6 +63,25 @@ struct exclude_list {
 	const char *src;
 
 	struct exclude **excludes;
+
+	/*
+	 * While scanning the excludes, we attempt to match the patterns
+	 * with a more restricted set that allows us to use hashsets for
+	 * matching logic, which is faster than the linear lookup in the
+	 * excludes array above. If non-zero, that check succeeded.
+	 */
+	unsigned use_cone_patterns;
+
+	/*
+	 * Stores paths where everything starting with those paths
+	 * is included.
+	 */
+	struct hashmap recursive_hashmap;
+
+	/*
+	 * Used to check single-level parents of blobs.
+	 */
+	struct hashmap parent_hashmap;
 };
 
 /*
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 8cc377b839..60f10864a1 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -134,6 +134,14 @@ test_expect_success 'cone mode: match patterns' '
 	test_cmp expect dir
 '
 
+test_expect_success 'cone mode: warn on bad pattern' '
+	test_when_finished mv sparse-checkout repo/.git/info &&
+	cp repo/.git/info/sparse-checkout . &&
+	echo "!/deep/deeper/*" >>repo/.git/info/sparse-checkout &&
+	git -C repo read-tree -mu HEAD 2>err &&
+	test_i18ngrep "unrecognized negative pattern" err
+'
+
 test_expect_success 'sparse-checkout disable' '
 	git -C repo sparse-checkout disable &&
 	test_path_is_missing repo/.git/info/sparse-checkout &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 9/9] sparse-checkout: init and add in cone mode
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2019-08-20 15:11 ` [PATCH 8/9] sparse-checkout: use hashmaps for cone patterns Derrick Stolee via GitGitGadget
@ 2019-08-20 15:11 ` Derrick Stolee via GitGitGadget
  2019-08-24  5:07   ` Elijah Newren
  2019-08-21 21:52 ` [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Elijah Newren
  9 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-08-20 15:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

To make the cone pattern set easy to use, update the behavior of
'git sparse-checkout [init|add]'.

Add '--cone' flag to 'git sparse-checkout init' to set the config
option 'core.sparseCheckout=cone'.

When running 'git sparse-checkout add' in cone mode, a user only
needs to supply a list of recursive folder matches. Git will
automatically add the necessary parent matches for the leading
directories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c          | 134 +++++++++++++++++++++++++++--
 t/t1091-sparse-checkout-builtin.sh |  35 ++++++++
 2 files changed, 164 insertions(+), 5 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 77e5235720..0a4e101ddd 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -6,15 +6,22 @@
 #include "repository.h"
 #include "run-command.h"
 #include "strbuf.h"
+#include "string-list.h"
 
 static char const * const builtin_sparse_checkout_usage[] = {
 	N_("git sparse-checkout [init|add|list|disable]"),
 	NULL
 };
 
+static const char * const builtin_sparse_checkout_init_usage[] = {
+	N_("git sparse-checkout init [--cone]"),
+	NULL
+};
+
 struct opts_sparse_checkout {
 	const char *subcommand;
 	int read_stdin;
+	int cone;
 } opts;
 
 static char *get_sparse_checkout_filename(void)
@@ -41,6 +48,60 @@ static void write_excludes_to_file(FILE *fp, struct exclude_list *el)
 	}
 }
 
+static void write_cone_to_file(FILE *fp, struct exclude_list *el)
+{
+	int i;
+	struct exclude_entry *entry;
+	struct hashmap_iter iter;
+	struct string_list sl = STRING_LIST_INIT_DUP;
+
+	hashmap_iter_init(&el->parent_hashmap, &iter);
+	while ((entry = hashmap_iter_next(&iter))) {
+		char *pattern = xstrdup(entry->pattern);
+		char *converted = pattern;
+		if (pattern[0] == '/')
+			converted++;
+		if (pattern[entry->patternlen - 1] == '/')
+			pattern[entry->patternlen - 1] = 0;
+		string_list_insert(&sl, converted);
+		free(pattern);
+	}
+
+	string_list_sort(&sl);
+	string_list_remove_duplicates(&sl, 0);
+
+	for (i = 0; i < sl.nr; i++) {
+		char *pattern = sl.items[i].string;
+
+		if (!strcmp(pattern, ""))
+			fprintf(fp, "/*\n!/*/*\n");
+		else
+			fprintf(fp, "/%s/*\n!/%s/*/*\n", pattern, pattern);
+	}
+
+	string_list_clear(&sl, 0);
+
+	hashmap_iter_init(&el->recursive_hashmap, &iter);
+	while ((entry = hashmap_iter_next(&iter))) {
+		char *pattern = xstrdup(entry->pattern);
+		char *converted = pattern;
+		if (pattern[0] == '/')
+			converted++;
+		if (pattern[entry->patternlen - 1] == '/')
+			pattern[entry->patternlen - 1] = 0;
+		string_list_insert(&sl, converted);
+		free(pattern);
+	}
+
+	string_list_sort(&sl);
+	string_list_remove_duplicates(&sl, 0);
+
+	for (i = 0; i < sl.nr; i++) {
+		char *pattern = sl.items[i].string;
+		fprintf(fp, "/%s/*\n", pattern);
+	}
+}
+
 static int sparse_checkout_list(int argc, const char **argv)
 {
 	struct exclude_list el;
@@ -141,8 +202,21 @@ static int sparse_checkout_init(int argc, const char **argv)
 	char *sparse_filename;
 	FILE *fp;
 	int res;
+	enum sparse_checkout_mode mode;
 
-	if (sc_set_config(SPARSE_CHECKOUT_FULL))
+	static struct option builtin_sparse_checkout_init_options[] = {
+		OPT_BOOL(0, "cone", &opts.cone,
+			 N_("initialize the sparse-checkout in cone mode")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_sparse_checkout_init_options,
+			     builtin_sparse_checkout_init_usage, 0);
+
+	mode = opts.cone ? SPARSE_CHECKOUT_CONE : SPARSE_CHECKOUT_FULL;
+
+	if (sc_set_config(mode))
 		return 1;
 
 	memset(&el, 0, sizeof(el));
@@ -183,6 +257,34 @@ static int sparse_checkout_init(int argc, const char **argv)
 	return sc_read_tree();
 }
 
+static void insert_recursive_pattern(struct exclude_list *el, struct strbuf *path)
+{
+	struct exclude_entry *e = xmalloc(sizeof(struct exclude_entry));
+	e->patternlen = path->len;
+	e->pattern = strbuf_detach(path, NULL);
+	hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
+
+	hashmap_add(&el->recursive_hashmap, e);
+
+	while (e->patternlen) {
+		char *slash = strrchr(e->pattern, '/');
+		char *oldpattern = e->pattern;
+		size_t newlen;
+
+		if (!slash)
+			break;
+
+		newlen = slash - e->pattern;
+		e = xmalloc(sizeof(struct exclude_entry));
+		e->patternlen = newlen;
+		e->pattern = xstrndup(oldpattern, newlen);
+		hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
+
+		if (!hashmap_get(&el->parent_hashmap, e, NULL))
+			hashmap_add(&el->parent_hashmap, e);
+	}
+}
+
 static int sparse_checkout_add(int argc, const char **argv)
 {
 	struct exclude_list el;
@@ -196,11 +298,33 @@ static int sparse_checkout_add(int argc, const char **argv)
 	add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);
 
 	fp = fopen(sparse_filename, "w");
-	write_excludes_to_file(fp, &el);
 
-	while (!strbuf_getline(&line, stdin)) {
-		strbuf_trim(&line);
-		fprintf(fp, "%s\n", line.buf);
+	if (core_sparse_checkout == SPARSE_CHECKOUT_FULL) {
+		write_excludes_to_file(fp, &el);
+
+		while (!strbuf_getline(&line, stdin)) {
+			strbuf_trim(&line);
+			fprintf(fp, "%s\n", line.buf);
+		}
+	} else if (core_sparse_checkout == SPARSE_CHECKOUT_CONE) {
+		while (!strbuf_getline(&line, stdin)) {
+			strbuf_trim(&line);
+
+			strbuf_trim_trailing_dir_sep(&line);
+
+			if (!line.len)
+				continue;
+
+			if (line.buf[0] == '/')
+				strbuf_remove(&line, 0, 1);
+
+			if (!line.len)
+				continue;
+
+			insert_recursive_pattern(&el, &line);
+		}
+
+		write_cone_to_file(fp, &el);
 	}
 
 	fclose(fp);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 60f10864a1..3412bafdff 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -157,4 +157,39 @@ test_expect_success 'sparse-checkout disable' '
 	test_cmp expect dir
 '
 
+test_expect_success 'cone mode: init and add' '
+	git -C repo sparse-checkout init --cone &&
+	git -C repo config --list >config &&
+	test_i18ngrep "core.sparsecheckout=cone" config &&
+	ls repo >dir  &&
+	echo a >expect &&
+	test_cmp expect dir &&
+	echo deep/deeper1/deepest | git -C repo sparse-checkout add &&
+	ls repo >dir  &&
+	cat >expect <<-EOF &&
+		a
+		deep
+	EOF
+	ls repo/deep >dir  &&
+	cat >expect <<-EOF &&
+		a
+		deeper1
+	EOF
+	ls repo/deep/deeper1 >dir  &&
+	cat >expect <<-EOF &&
+		a
+		deepest
+	EOF
+	test_cmp expect dir &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/*
+		/deep/*
+		!/deep/*/*
+		/deep/deeper1/*
+		!/deep/deeper1/*/*
+		/deep/deeper1/deepest/*
+	EOF
+	test_cmp expect repo/.git/info/sparse-checkout
+'
 test_done
\ No newline at end of file
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2019-08-20 15:11 ` [PATCH 9/9] sparse-checkout: init and add in cone mode Derrick Stolee via GitGitGadget
@ 2019-08-21 21:52 ` Elijah Newren
  2019-08-22 13:10   ` Derrick Stolee
  9 siblings, 1 reply; 29+ messages in thread
From: Elijah Newren @ 2019-08-21 21:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: Git Mailing List, Junio C Hamano

On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This RFC includes a potential direction to make the sparse-checkout more
> user-friendly. While there, I also present a way to use a limited set of
> patterns to gain a significant performance boost in very large repositories.
>
> Sparse-checkout is only documented as a subsection of the read-tree docs
> [1], which makes the feature hard to discover. Users have trouble navigating
> the feature, especially at clone time [2], and have even resorted to
> creating their own helper tools [3].

Ooh, intriguing.  Count me as another person who has resorted to
making my own helper tool for others to use (specific to our internal
repository, though, as it also figures out inter-module dependencies
to allow specifying only a few modules of interest while still
checking out everything needed to build those; but it'd be nice to
need less scripting to handle the git-related bits to actually
sparsify or densify).

> This RFC attempts to solve these problems using a new builtin. Here is a
> sample workflow to give a feeling for how it can work:
>
> In an existing repo:
>
> $ git sparse-checkout init
> $ ls
> myFile1.txt myFile2.txt
>
> $ git sparse-checkout add
> /myFolder/*
> ^D
> $ ls
> myFile1.txt myFile2.txt myFolder
> $ ls myFolder
> a.c a.h
> $ git sparse-checkout disable
> $ ls
> hiddenFolder myFile1.txt myFile2.txt myFolder
>
> At clone time:
>
> $ git clone --sparse origin repo
> $ cd repo
> $ ls
> myFile1.txt myFile2.txt
> $ git sparse-checkout add
> /myFolder/*
> ^D
> $ ls
> myFile1.txt myFile2.txt myFolder
>
> Here are some more specific details:
>
>  * git sparse-checkout init enables core.sparseCheckout and populates the
>    sparse-checkout file with patterns that match only the files at root.

Does it enable core.sparseCheckout in the current worktree, or for all
worktrees?  Do we require extensions.worktreeConfig to be set to true
first?  If we don't require extensions.worktreeConfig to be set to
true, and users add worktrees later, do they encounter negative
surprises (immediately or later)?

worktrees in combination with sparseCheckouts were a headache here
until I just forced people to manually first set
extensions.worktreeConfig to true before using my 'sparsify' script,
regardless of whether the user was currently using worktrees.  That
fixed the issues, but having to provide a long error message and
explanation of why I wanted users to set some special config first was
slightly annoying.

I wonder if 'git worktree' and maybe even 'git config' should
themselves have special handling for core.sparseCheckouts, because it
can be a real mess otherwise.

>  * git clone learns the --sparse argument to run git sparse-checkout init
>    before the first checkout.

Nice.

>  * git sparse-checkout add reads patterns from stdin, one per line, then
>    adds them to the sparse-checkout file and refreshes the working
>    directory.

The default of reading from stdin seems a bit unusual to me, and I
worry about having to explain that to users.  I'd rather the add
command took positional parameters (anything that doesn't start with a
hyphen) and added those, e.g.
  $ git sparse-checkout add '/myFolder/*' '
with the option of the user specifying --stdin.

>  * git sparse-checkout disable removes the patterns from the sparse-checkout
>    file, disables core.sparseCheckout, and refills the working directory.

Does it leave an empty sparse-checkout file around?  Also, what if
users have several paths defining their sparse pattern, and want to
temporarily get a full checkout and then come back -- do they need to
re-specify all the paths?  (Maybe this *is* the route we want to go;
I'm just trying to mention any possible negative effects we _might_
run into so we can consider them.  It's not quite as relevant in my
case since people specify a few toplevel modules and sparse-checkout
gets several entries auto-generated for them.)

Also, I'm particularly worried that a user with multiple worktrees,
both sparse, could run 'git sparse-checkout disable' in one and then
find that when they return to the other worktree they get a variety of
nasty surprises (e.g. accidental staging or committing of the deletion
of a huge number of files, random weird errors, or gradual and weird
de-sparsifying as various git commands are run).  This, of course, can
be averted by making sure core.sparseCheckout is set on a per-worktree
basis, but that seems to be something people only do after running
into problems several times unless some kind of tooling enforces it.

>  * git sparse-checkout list lists the contents of the sparse-checkout file.
>
>
>
> The documentation for the sparse-checkout feature can now live primarily
> with the git-sparse-checkout documentation.

Yaay!

> Cone Mode
> =========
>
> What really got me interested in this area is a performance problem. If we
> have N patterns in the sparse-checkout file and M entries in the index, then
> we can perform up to O(N * M) pattern checks in clear_ce_flags(). This
> quadratic growth is not sustainable in a repo with 1,000+ patterns and
> 1,000,000+ index entries.

This has worried me for a while, even if it hasn't yet caused us
issues in practice.

> To solve this problem, I propose a new, more restrictive mode to
> sparse-checkout: "cone mode". In this mode, all patterns are based on prefix
> matches at a directory level. This can then use hashsets for fast
> performance -- O(M) instead of O(N*M). My hashset implementation is based on
> the virtual filesystem hook in the VFS for Git custom code [4].

Sweet!

> In cone mode, a user specifies a list of folders which the user wants every
> file inside. In addition, the cone adds all blobs that are siblings of the
> folders in the directory path to that folder. This makes the directories
> look "hydrated" as a user drills down to those recursively-closed folders.
> These directories are called "parent" folders, as a file matches them only
> if the file's immediate parent is that directory.
>
> When building a prototype of this feature, I used a separate file to contain
> the list of recursively-closed folders and built the hashsets dynamically
> based on that file. In this implementation, I tried to maximize the amount
> of backwards-compatibility by storing all data in the sparse-checkout file
> using patterns recognized by earlier Git versions.
>
> For example, if we add A/B/C as a recursive folder, then we add the
> following patterns to the sparse-checkout file:
>
> /*
> !/*/*
> /A/*
> !/A/*/*
> /A/B/*
> !/A/B/*/*
> /A/B/C/*
>
> The alternating positive/negative patterns say "include everything in this
> folder, but exclude everything another level deeper". The final pattern has
> no matching negation, so is a recursively closed pattern.

Oh, um, would there be any option for fast but without grabbing
sibling and parent files of requested directories?  And could users
still request individual files (not with regex or pathspec, but fully
specifying the path) and still get the fast mode?

Basically, our sparse usage is exclusively specifying leading
directories or full pathnames of individual files, but we really want
the repo to feel smaller and make sure people notice at a glance.  We
have a huge 'modules/' directory, and want people to be able to get
just 15 of the 500 or so subdirectories that would appear in that
directory with a non-sparse checkout.  And similarly we want to be
able to grab just one or two files from a directory of many files.

> Note that I have some basic warnings to try and check that the
> sparse-checkout file doesn't match what would be written by a cone-mode add.
> In such a case, Git writes a warning to stderr and continues with the old
> pattern matching algorithm. These checks are currently very barebones, and
> would need to be updated with more robust checks for things like regex
> characters in the middle of the pattern. As review moves forward (and if we
> don't change the data storage) then we could spend more time on this.

Instead of trying to validate the sparse-checkout file everytime,
perhaps we want to change core.sparseCheckout from a boolean to a
tri-state or something where it specifies how to parse the
sparse-checkout file?  Or maybe when special directive (some form of
comment-looking line) appears at the top of sparse-checkout then we
use the hashsets speedup while disallowing general regexes and
pathspecs other than leading directories and full pathnames?

I'm not sure if that makes sense or not; just throwing it out there as an idea.

> Derrick Stolee (8):
>   sparse-checkout: create builtin with 'list' subcommand
>   sparse-checkout: create 'init' subcommand
>   clone: add --sparse mode
>   sparse-checkout: 'add' subcommand
>   sparse-checkout: create 'disable' subcommand
>   sparse-checkout: add 'cone' mode
>   sparse-checkout: use hashmaps for cone patterns
>   sparse-checkout: init and add in cone mode
>
> Jeff Hostetler (1):
>   trace2:experiment: clear_ce_flags_1

I'll try to get some time to look over these patches in the next few days.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-21 21:52 ` [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Elijah Newren
@ 2019-08-22 13:10   ` Derrick Stolee
  2019-08-22 14:25     ` Derrick Stolee
  2019-08-24  5:40     ` Elijah Newren
  0 siblings, 2 replies; 29+ messages in thread
From: Derrick Stolee @ 2019-08-22 13:10 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano

On 8/21/2019 5:52 PM, Elijah Newren wrote:
> On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This RFC includes a potential direction to make the sparse-checkout more
>> user-friendly. While there, I also present a way to use a limited set of
>> patterns to gain a significant performance boost in very large repositories.
>>
>> Sparse-checkout is only documented as a subsection of the read-tree docs
>> [1], which makes the feature hard to discover. Users have trouble navigating
>> the feature, especially at clone time [2], and have even resorted to
>> creating their own helper tools [3].
> 
> Ooh, intriguing.  Count me as another person who has resorted to
> making my own helper tool for others to use (specific to our internal
> repository, though, as it also figures out inter-module dependencies
> to allow specifying only a few modules of interest while still
> checking out everything needed to build those; but it'd be nice to
> need less scripting to handle the git-related bits to actually
> sparsify or densify).
> 
>> This RFC attempts to solve these problems using a new builtin. Here is a
>> sample workflow to give a feeling for how it can work:
>>
>> In an existing repo:
>>
>> $ git sparse-checkout init
>> $ ls
>> myFile1.txt myFile2.txt
>>
>> $ git sparse-checkout add
>> /myFolder/*
>> ^D
>> $ ls
>> myFile1.txt myFile2.txt myFolder
>> $ ls myFolder
>> a.c a.h
>> $ git sparse-checkout disable
>> $ ls
>> hiddenFolder myFile1.txt myFile2.txt myFolder
>>
>> At clone time:
>>
>> $ git clone --sparse origin repo
>> $ cd repo
>> $ ls
>> myFile1.txt myFile2.txt
>> $ git sparse-checkout add
>> /myFolder/*
>> ^D
>> $ ls
>> myFile1.txt myFile2.txt myFolder
>>
>> Here are some more specific details:
>>
>>  * git sparse-checkout init enables core.sparseCheckout and populates the
>>    sparse-checkout file with patterns that match only the files at root.
> 
> Does it enable core.sparseCheckout in the current worktree, or for all
> worktrees?  Do we require extensions.worktreeConfig to be set to true
> first?  If we don't require extensions.worktreeConfig to be set to
> true, and users add worktrees later, do they encounter negative
> surprises (immediately or later)?

This is an interesting scenario that I had not considered. Thanks!

My guess is that we should set `extensions.worktreeConfig=true` to
avoid surprises. I'll need to play with this to discover the answers
to these questions:

1. Where does the worktree look for the sparse-checkout file? Does
   each worktree have its own sparse-checkout file? Should it?

2. If I have `extensions.worktreeConfig=true` and `core.sparseCheckout=true`
   in the current worktree and run `git worktree add`, does the new worktree
   have `core.sparseCheckout=true`? Can we `git clone --sparse` and then
   start building sparse worktrees seamlessly?
 
> worktrees in combination with sparseCheckouts were a headache here
> until I just forced people to manually first set
> extensions.worktreeConfig to true before using my 'sparsify' script,
> regardless of whether the user was currently using worktrees.  That
> fixed the issues, but having to provide a long error message and
> explanation of why I wanted users to set some special config first was
> slightly annoying.
> 
> I wonder if 'git worktree' and maybe even 'git config' should
> themselves have special handling for core.sparseCheckouts, because it
> can be a real mess otherwise.
> 
>>  * git clone learns the --sparse argument to run git sparse-checkout init
>>    before the first checkout.
> 
> Nice.
> 
>>  * git sparse-checkout add reads patterns from stdin, one per line, then
>>    adds them to the sparse-checkout file and refreshes the working
>>    directory.
> 
> The default of reading from stdin seems a bit unusual to me, and I
> worry about having to explain that to users.  I'd rather the add
> command took positional parameters (anything that doesn't start with a
> hyphen) and added those, e.g.
>   $ git sparse-checkout add '/myFolder/*' '
> with the option of the user specifying --stdin.

I had the same thought, and likely that's where we should go with the
builtin. For our needs, the input over stdin is more important for
testing, so I built it first. I will adjust the CLI here to take a set
of paths over the arguments unless --stdin is given.

>>  * git sparse-checkout disable removes the patterns from the sparse-checkout
>>    file, disables core.sparseCheckout, and refills the working directory.
> 
> Does it leave an empty sparse-checkout file around?  Also, what if
> users have several paths defining their sparse pattern, and want to
> temporarily get a full checkout and then come back -- do they need to
> re-specify all the paths?  (Maybe this *is* the route we want to go;
> I'm just trying to mention any possible negative effects we _might_
> run into so we can consider them.  It's not quite as relevant in my
> case since people specify a few toplevel modules and sparse-checkout
> gets several entries auto-generated for them.)

In this case, there is an intermediate step (that follows the existing
advice) to modify the sparse-checkout file to contain only "/*\n" then
run read-tree to fill the working directory before disabling the config
setting.

Perhaps "disable" is the wrong word to use, as it makes you think that
there should be an "enable" that can quickly toggle between the two
modes. Maybe instead it should be "git sparse-checkout reset [empty|full]"
where you could 'reset' the sparse-checkout to one of two initial
states:

1. empty: only files at root are included.
2. full: all files are included.

In each case, we would obliterate the existing sparse-checkout entries,
but hopefully that behavior is more clear from the command names.

> Also, I'm particularly worried that a user with multiple worktrees,
> both sparse, could run 'git sparse-checkout disable' in one and then
> find that when they return to the other worktree they get a variety of
> nasty surprises (e.g. accidental staging or committing of the deletion
> of a huge number of files, random weird errors, or gradual and weird
> de-sparsifying as various git commands are run).  This, of course, can
> be averted by making sure core.sparseCheckout is set on a per-worktree
> basis, but that seems to be something people only do after running
> into problems several times unless some kind of tooling enforces it.
> 
>>  * git sparse-checkout list lists the contents of the sparse-checkout file.
>>
>>
>>
>> The documentation for the sparse-checkout feature can now live primarily
>> with the git-sparse-checkout documentation.
> 
> Yaay!
> 
>> Cone Mode
>> =========
>>
>> What really got me interested in this area is a performance problem. If we
>> have N patterns in the sparse-checkout file and M entries in the index, then
>> we can perform up to O(N * M) pattern checks in clear_ce_flags(). This
>> quadratic growth is not sustainable in a repo with 1,000+ patterns and
>> 1,000,000+ index entries.
> 
> This has worried me for a while, even if it hasn't yet caused us
> issues in practice.
> 
>> To solve this problem, I propose a new, more restrictive mode to
>> sparse-checkout: "cone mode". In this mode, all patterns are based on prefix
>> matches at a directory level. This can then use hashsets for fast
>> performance -- O(M) instead of O(N*M). My hashset implementation is based on
>> the virtual filesystem hook in the VFS for Git custom code [4].
> 
> Sweet!
> 
>> In cone mode, a user specifies a list of folders which the user wants every
>> file inside. In addition, the cone adds all blobs that are siblings of the
>> folders in the directory path to that folder. This makes the directories
>> look "hydrated" as a user drills down to those recursively-closed folders.
>> These directories are called "parent" folders, as a file matches them only
>> if the file's immediate parent is that directory.
>>
>> When building a prototype of this feature, I used a separate file to contain
>> the list of recursively-closed folders and built the hashsets dynamically
>> based on that file. In this implementation, I tried to maximize the amount
>> of backwards-compatibility by storing all data in the sparse-checkout file
>> using patterns recognized by earlier Git versions.
>>
>> For example, if we add A/B/C as a recursive folder, then we add the
>> following patterns to the sparse-checkout file:
>>
>> /*
>> !/*/*
>> /A/*
>> !/A/*/*
>> /A/B/*
>> !/A/B/*/*
>> /A/B/C/*
>>
>> The alternating positive/negative patterns say "include everything in this
>> folder, but exclude everything another level deeper". The final pattern has
>> no matching negation, so is a recursively closed pattern.
> 
> Oh, um, would there be any option for fast but without grabbing
> sibling and parent files of requested directories?  And could users
> still request individual files (not with regex or pathspec, but fully
> specifying the path) and still get the fast mode?

Exact files could probably be included and still be fast. It requires an
extra hash check per entry, but that's a small price to pay I think.

With the sibling files, this is something I believe to be user-friendly:
as a user drills down into the folder they included recursively, there may
be helpful files along the way, like documentation, project files, etc.

Here is my philosophical position here: a repo can take advantage of the
sparse-checkout feature if it is properly componetized. Those component
boundaries are likely at folder boundaries. Any file that exists in a parent
folder for two components is likely important to _both_ components. If
a file is large and is not needed by both components, it should be placed
deeper in the tree, so it can be avoided.

With that philosophy in mind, I designed this to help users fall into the
"pit of success" when their repo is in a good shape AND to motivate users
with repos in non-optimal shapes to reorganize.

The thought I had about exact file names is similar: if there is a large
list of files in a folder where I only need a subset, then how do I know
if a new file is added that I need? It will not show up in the directory
without updating the sparse-checkout. A user would discover this need by
something going wrong when they are not interacting with version control:
a build.

This is particularly important with the root directory. We need things
like .gitignore, .gitattributes, README, LICENSE, etc. to be populated
by default. If there are too many files at root to reasonably work with
the repo, then the repo should be reorganized using folders.

> Basically, our sparse usage is exclusively specifying leading
> directories or full pathnames of individual files, but we really want
> the repo to feel smaller and make sure people notice at a glance.  We
> have a huge 'modules/' directory, and want people to be able to get
> just 15 of the 500 or so subdirectories that would appear in that
> directory with a non-sparse checkout.  And similarly we want to be
> able to grab just one or two files from a directory of many files.

Your modules/ example seems to work with the feature as designed, as
you want a set of folders one level deeper. Grabbing one or two files
from a directory is a direction we can go with the feature, but I will
continue to believe that should be a rare occurrence compared to including
a folder recursively.

>> Note that I have some basic warnings to try and check that the
>> sparse-checkout file doesn't match what would be written by a cone-mode add.
>> In such a case, Git writes a warning to stderr and continues with the old
>> pattern matching algorithm. These checks are currently very barebones, and
>> would need to be updated with more robust checks for things like regex
>> characters in the middle of the pattern. As review moves forward (and if we
>> don't change the data storage) then we could spend more time on this.
> 
> Instead of trying to validate the sparse-checkout file everytime,
> perhaps we want to change core.sparseCheckout from a boolean to a
> tri-state or something where it specifies how to parse the
> sparse-checkout file?  Or maybe when special directive (some form of
> comment-looking line) appears at the top of sparse-checkout then we
> use the hashsets speedup while disallowing general regexes and
> pathspecs other than leading directories and full pathnames?

In this series, I turn `core.sparseCheckout` into a tri-state, and only
try to validate the sparse-checkout when `core.sparseCheckout=cone`.
This avoids spending time on the validation when someone is content using
the existing feature.

The _intent_ of using the sparse-checkout file and no extra data structure
was to let other clients (or an older client) read the sparse-checkout data
and result in the same working directory. One thing I realized after
submitting is that the tri-state config variable will cause old clients
to error on parsing the non-boolean value. Instead, in v2 I will introduce
a new boolean config variable "core.sparseCheckoutCone" that will do the
same thing as the current series when `core.sparseCheckout=cone` and will
fix this compat scenario.

> I'll try to get some time to look over these patches in the next few days.

I look forward to your feedback! I also have some feedback to respond to
from my team [1], but I'm waiting to make sure the community likes the
overall idea before jumping into code style and method organization
details.

Thanks,
-Stolee

[1] https://github.com/microsoft/git/pull/180


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-22 13:10   ` Derrick Stolee
@ 2019-08-22 14:25     ` Derrick Stolee
  2019-08-24  5:40     ` Elijah Newren
  1 sibling, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2019-08-22 14:25 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano

On 8/22/2019 9:10 AM, Derrick Stolee wrote:
> On 8/21/2019 5:52 PM, Elijah Newren wrote:
>> On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
>>> For example, if we add A/B/C as a recursive folder, then we add the
>>> following patterns to the sparse-checkout file:
>>>
>>> /*
>>> !/*/*
>>> /A/*
>>> !/A/*/*
>>> /A/B/*
>>> !/A/B/*/*
>>> /A/B/C/*
>>>
>>> The alternating positive/negative patterns say "include everything in this
>>> folder, but exclude everything another level deeper". The final pattern has
>>> no matching negation, so is a recursively closed pattern.
>>
>> Oh, um, would there be any option for fast but without grabbing
>> sibling and parent files of requested directories?  And could users
>> still request individual files (not with regex or pathspec, but fully
>> specifying the path) and still get the fast mode?
> 
> Exact files could probably be included and still be fast. It requires an
> extra hash check per entry, but that's a small price to pay I think.

Quick side note on this point about exact file names and the REAL reason
for the parent paths that I had forgotten until just now.

The following comment exists in unpack-trees.c, clear_ce_flags_dir():

	/*
	 * TODO: check el, if there are no patterns that may conflict
	 * with ret (iow, we know in advance the incl/excl
	 * decision for the entire directory), clear flag here without
	 * calling clear_ce_flags_1(). That function will call
	 * the expensive is_excluded_from_list() on every entry.
	 */

While I haven't implemented it yet in this RFC, this TODO can actually
happen with the current set of cone patterns:

1. If we hit a directory that is not in a parent or recursive path,
   then all paths it contains must have their skipworktree bits set.
   We can avoid computing hashes for them.

2. If we hit a directory that is in a recursive path, then all paths
   it contains must have skipworktree bits off. We can avoid computing
   hashes for them.

When we have a million index entries, these hash computations are not
insignificant!

With that in mind, I think there is a _performance_ reason to include
the parent folders in addition to _user experience_ reason. If we
were to add the complexity of exact file matches, then we would also
want to add the parent folders leading to that file so we can still
do the logic above.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/9] sparse-checkout: create builtin with 'list' subcommand
  2019-08-20 15:11 ` [PATCH 1/9] sparse-checkout: create builtin with 'list' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-23 22:30   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-23 22:30 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:13 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
<snip>
> The documentation provided is adapted from the "git read-tree"
> documentation with a few edits for clarity in the new context.
> Extra sections are added to hint toward a future change to
> a moer restricted pattern set.

s/moer/more/

<snip>
> +"Sparse checkout" allows populating the working directory sparsely.
> +It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
> +Git whether a file in the working directory is worth looking at. If
> +the skip-worktree bit is set, then the file is ignored in the working
> +directory. Git will not populate the contents of those files, which
> +makes a sparse checkout helpful when working in a repository with many
> +files, but only a few are important to the current user.
> +
> +The `$GIT_DIR/info/sparse-checkout` file is used to define the
> +skip-worktree reference bitmap. When Git updates the working
> +directory, it resets the skip-worktree bit in the index based on this
> +file. If an entry
> +matches a pattern in this file, skip-worktree will not be set on
> +that entry. Otherwise, skip-worktree will be set.
> +
> +Then it compares the new skip-worktree value with the previous one. If
> +skip-worktree turns from set to unset, it will add the corresponding
> +file back. If it turns from unset to set, that file will be removed.

I had to read this twice for it to make sense.  Not sure I have a real
good suggestion here, the name "skip-worktree" instead of e.g
"wanted-in-worktree" just naturally leads us into sentences with one
negation automatically and we sometimes have to add more.

Maybe just replace the last two paragraphs with:

The `$GIT_DIR/info/sparse-checkout` file is used to define the
skip-worktree reference bitmap. When Git updates the working
directory, it updates the skip-worktree bits in the index based on this
file and removes or restores files in the working copy to match.

> +
> +## FULL PATTERN SET
> +
> +By default, the sparse-checkout file uses the same syntax as `.gitignore`
> +files.
> +
> +While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
> +files are in, you can also specify what files are _not_ in, using
> +negate patterns. For example, to remove the file `unwanted`:

s/in/included/?

> +Another tricky thing is fully repopulating the working directory when you
> +no longer want sparse checkout. You cannot just disable "sparse
> +checkout" because skip-worktree bits are still in the index and your working
> +directory is still sparsely populated. You should re-populate the working
> +directory with the `$GIT_DIR/info/sparse-checkout` file content as
> +follows:
> +
> +----------------
> +/*
> +----------------

Can we just get rid of this part of the documentation, since there
will be a sparse-checkout command to disable/undo/reset?  However, it
could be useful to mention cases when disabling/undoing/resetting a
sparse-checkout won't work, if there are some.  For example, with the
previous read-tree implementation, you could not undo the sparse
checkout if the index had any unstaged entries, and you couldn't undo
it if there were any files present that corresponding to the sparse
patterns (for fear they'd be overwritten -- however, every once in a
while someone tried to desparsify, it failed e.g. due to the disk
becoming full, and then after freeing up space there were zillions of
files that exactly matched what de-sparsifying would have put there
but the command wanted the user to manually delete them first.)

> +
> +Then you can disable sparse checkout. Sparse checkout support in 'git
> +read-tree' and similar commands is disabled by default. You need to
> +set `core.sparseCheckout` to `true` in order to have sparse checkout
> +support.

...and get rid of this paragraph because I'd expect git
sparse-checkout to come with a subcommand (init/add/whatever) to set
this for the user?  Unless maybe you want to add some words about why
the command sets core.sparseCheckout...and related workspace related
stuff as we talked about elsewhere.

> +test_expect_success 'git sparse-checkout list (empty)' '
> +       git -C repo sparse-checkout list >list 2>err &&
> +       test_line_count = 0 list &&
> +       test_i18ngrep "failed to parse sparse-checkout file; it may not exist" err

Is that the error we want, rather than something like "This worktree
is not sparse (no sparse-checkout file exists and core.sparseCheckout
is false"?

> +'
> +
> +test_expect_success 'git sparse-checkout list (populated)' '
> +       test_when_finished rm -f repo/.git/info/sparse-checkout &&
> +       cat >repo/.git/info/sparse-checkout <<-EOF &&
> +               /folder1/*
> +               /deep/
> +               **/a
> +               !*bin*
> +       EOF
> +       git -C repo sparse-checkout list >list &&
> +       cat >expect <<-EOF &&
> +               /folder1/*
> +               /deep/
> +               **/a
> +               !*bin*
> +       EOF
> +       test_cmp expect list

I have a `./sparsify --stats` that reports
   You are now in a sparse checkout with only 3499 of the 53625 files.
or
  You are not in a sparse checkout.

and I have a --info option that reports both the stats and the list of
sparse paths similar to this sparse-checkout list command.  The stats
aren't important, I guess, but seem nice for the user.  I don't know
if you want to include anything like that in this or another command,
but just thought I'd mention it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/9] sparse-checkout: create 'init' subcommand
  2019-08-20 15:11 ` [PATCH 2/9] sparse-checkout: create 'init' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-23 23:02   ` Elijah Newren
  2019-09-11 14:27     ` Derrick Stolee
  2019-09-11 20:28     ` Derrick Stolee
  0 siblings, 2 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-23 23:02 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:13 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Getting started with a sparse-checkout file can be daunting. Help
> users start their sparse enlistment using 'git sparse-checkout init'.
> This will set 'core.sparseCheckout=true' in their config, write
> an initial set of patterns to the sparse-checkout file, and update
> their working directory.
>
> Using 'git read-tree' to clear directories does not work cleanly
> on Windows, so manually delete directories that are tracked by Git
> before running read-tree.

Is that a bug in read-tree that needs to be fixed?

> The use of running another process for 'git read-tree' is likely
> suboptimal, but that can be improved in a later change, if valuable.

I think it's valuable.  The bigger problem may be that "git read-tree
-mu HEAD" may turn out to be insufficient for our needs; see below....

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt |   7 ++
>  builtin/sparse-checkout.c             | 106 +++++++++++++++++++++++++-
>  t/t1091-sparse-checkout-builtin.sh    |  40 ++++++++++
>  3 files changed, 152 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index ca0ca6a12f..50c53ee60a 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -26,6 +26,13 @@ COMMANDS
>  'list'::
>         Provide a list of the contents in the sparse-checkout file.
>
> +'init'::
> +       Enable the `core.sparseCheckout` setting. If the
> +       sparse-checkout file does not exist, then populate it with
> +       patterns that match every file in the root directory and
> +       no other directories, then will remove all directories tracked
> +       by Git. Add patterns to the sparse-checkout file to
> +       repopulate the working directory.
>
>  SPARSE CHECKOUT
>  ----------------
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 6477a6ed9c..86d24e6295 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -8,7 +8,7 @@
>  #include "strbuf.h"
>
>  static char const * const builtin_sparse_checkout_usage[] = {
> -       N_("git sparse-checkout [list]"),
> +       N_("git sparse-checkout [init|list]"),
>         NULL
>  };
>
> @@ -64,6 +64,108 @@ static int sparse_checkout_list(int argc, const char **argv)
>         return 0;
>  }
>
> +static int sc_read_tree(void)
> +{
> +       struct argv_array argv = ARGV_ARRAY_INIT;
> +       int result = 0;
> +       argv_array_pushl(&argv, "read-tree", "-m", "-u", "HEAD", NULL);
> +
> +       if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> +               error(_("failed to update index with new sparse-checkout paths"));
> +               result = 1;
> +       }

`git read-tree -m -u HEAD` will fail if the index has any higher stage
entries in it, even if those higher stage entries correspond to files
which are included in the sparseness patterns and thus would not need
an update.  It might be nice if we can find a way to provide a better
error message, and/or implement the read-tree -m -u HEAD internally in
a way that will allow us to not fail if the conflicted files are
included in the sparse set.

> +
> +       argv_array_clear(&argv);
> +       return result;
> +}
> +
> +static int sc_enable_config(void)
> +{
> +       struct argv_array argv = ARGV_ARRAY_INIT;
> +       int result = 0;
> +       argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", "true", NULL);

Why --add?  That seems really odd to me.

This should also have "--worktree".  And this function should either
set extensions.worktreeConfig to true or die if it isn't already set;
not sure which.  There's some UI and documentation stuff to figure out
here...

> +
> +       if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> +               error(_("failed to enable core.sparseCheckout"));
> +               result = 1;
> +       }
> +
> +       argv_array_clear(&argv);
> +       return result;
> +}
> +
> +static int delete_directory(const struct object_id *oid, struct strbuf *base,
> +               const char *pathname, unsigned mode, int stage, void *context)
> +{
> +       struct strbuf dirname = STRBUF_INIT;
> +       struct stat sb;
> +
> +       strbuf_addstr(&dirname, the_repository->worktree);
> +       strbuf_addch(&dirname, '/');
> +       strbuf_addstr(&dirname, pathname);
> +
> +       if (stat(dirname.buf, &sb) || !(sb.st_mode & S_IFDIR))
> +               return 0;
> +
> +       if (remove_dir_recursively(&dirname, 0))

flags = 0 implies not REMOVE_DIR_EMPTY_ONLY.  I'm not familiar with
remove_dir_recursively(), but won't this delete everything...including
untracked files?  If so, that sounds like a bug.

> +               warning(_("failed to remove directory '%s'"),
> +                       dirname.buf);
> +
> +       strbuf_release(&dirname);
> +       return 0;
> +}
> +
> +static int sparse_checkout_init(int argc, const char **argv)
> +{
> +       struct tree *t;
> +       struct object_id oid;
> +       struct exclude_list el;
> +       static struct pathspec pathspec;
> +       char *sparse_filename;
> +       FILE *fp;
> +       int res;
> +
> +       if (sc_enable_config())
> +               return 1;
> +
> +       memset(&el, 0, sizeof(el));
> +
> +       sparse_filename = get_sparse_checkout_filename();
> +       res = add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);

But 'el' isn't used again?  Why are we getting the list of files from
sparse_filename then?

> +
> +       /* If we already have a sparse-checkout file, use it. */
> +       if (res >= 0) {
> +               free(sparse_filename);
> +               goto reset_dir;
> +       }
> +
> +       /* initial mode: all blobs at root */
> +       fp = fopen(sparse_filename, "w");
> +       free(sparse_filename);
> +       fprintf(fp, "/*\n!/*/*\n");
> +       fclose(fp);

Makes sense.

> +
> +       /* remove all directories in the root, if tracked by Git */
> +       if (get_oid("HEAD", &oid)) {
> +               /* assume we are in a fresh repo */
> +               return 0;
> +       }
> +
> +       t = parse_tree_indirect(&oid);
> +
> +       parse_pathspec(&pathspec, PATHSPEC_ALL_MAGIC &
> +                                 ~(PATHSPEC_FROMTOP | PATHSPEC_LITERAL),
> +                      PATHSPEC_PREFER_CWD,
> +                      "", NULL);
> +
> +       if (read_tree_recursive(the_repository, t, "", 0, 0, &pathspec,
> +                               delete_directory, NULL))
> +               return 1;

Since this is only needed on Windows, as per your commit message,
should it be #ifdef'd?  Or is this actually a bug that should be fixed
in "git read-tree -mu HEAD"?

> +
> +reset_dir:
> +       return sc_read_tree();
> +}
> +

The rest looks fine.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/9] clone: add --sparse mode
  2019-08-20 15:11 ` [PATCH 3/9] clone: add --sparse mode Derrick Stolee via GitGitGadget
@ 2019-08-23 23:17   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-23 23:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When someone wants to clone a large repository, but plans to work
> using a sparse-checkout file, they either need to do a full
> checkout first and then reduce the patterns they included, or
> clone with --no-checkout, set up their patterns, and then run
> a checkout manually. This requires knowing a lot about the repo
> shape and how sparse-checkout works.
>
> Add a new '--sparse' option to 'git clone' that initializes the
> sparse-checkout file to include the following patterns:
>
>         /*
>         !/*/*
>
> These patterns include every file in the root directory, but
> no directories. This allows a repo to include files like a
> README or a bootstrapping script to grow enlistments from that
> point.

Nice.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-clone.txt        |  8 +++++++-
>  builtin/clone.c                    | 27 +++++++++++++++++++++++++++
>  t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++
>  3 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index 34011c2940..0fe91d2f04 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -15,7 +15,7 @@ SYNOPSIS
>           [--dissociate] [--separate-git-dir <git dir>]
>           [--depth <depth>] [--[no-]single-branch] [--no-tags]
>           [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
> -         [--[no-]remote-submodules] [--jobs <n>] [--] <repository>
> +         [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
>           [<directory>]
>
>  DESCRIPTION
> @@ -156,6 +156,12 @@ objects from the source repository into a pack in the cloned repository.
>         used, neither remote-tracking branches nor the related
>         configuration variables are created.
>
> +--sparse::
> +       Initialize the sparse-checkout file so the working
> +       directory starts with only the files in the root
> +       of the repository. The sparse-checkout file can be
> +       modified to grow the working directory as needed.
> +
>  --mirror::
>         Set up a mirror of the source repository.  This implies `--bare`.
>         Compared to `--bare`, `--mirror` not only maps local branches of the
> diff --git a/builtin/clone.c b/builtin/clone.c
> index f665b28ccc..d6d49a73ff 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -60,6 +60,7 @@ static const char *real_git_dir;
>  static char *option_upload_pack = "git-upload-pack";
>  static int option_verbosity;
>  static int option_progress = -1;
> +static int option_sparse_checkout;
>  static enum transport_family family;
>  static struct string_list option_config = STRING_LIST_INIT_NODUP;
>  static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
> @@ -147,6 +148,8 @@ static struct option builtin_clone_options[] = {
>         OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
>         OPT_BOOL(0, "remote-submodules", &option_remote_submodules,
>                     N_("any cloned submodules will use their remote-tracking branch")),
> +       OPT_BOOL(0, "sparse", &option_sparse_checkout,
> +                   N_("initialize sparse-checkout file to include only files at root")),
>         OPT_END()
>  };
>
> @@ -734,6 +737,27 @@ static void update_head(const struct ref *our, const struct ref *remote,
>         }
>  }
>
> +static int git_sparse_checkout_init(const char *repo)
> +{
> +       struct argv_array argv = ARGV_ARRAY_INIT;
> +       int result = 0;
> +       argv_array_pushl(&argv, "-C", repo, "sparse-checkout", "init", NULL);
> +
> +       /*
> +        * We must apply the setting in the current process
> +        * for the later checkout to use the sparse-checkout file.
> +        */
> +       core_apply_sparse_checkout = 1;
> +
> +       if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> +               error(_("failed to initialize sparse-checkout"));
> +               result = 1;
> +       }

Sigh...so much forking of additional processes.  I'd really rather
that we were reducing how much of this we are doing in the codebase
instead of adding more.  Every fork makes following stuff in a
debugger harder.

> +
> +       argv_array_clear(&argv);
> +       return result;
> +}
> +
>  static int checkout(int submodule_progress)
>  {
>         struct object_id oid;
> @@ -1107,6 +1131,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>         if (option_required_reference.nr || option_optional_reference.nr)
>                 setup_reference();
>
> +       if (option_sparse_checkout && git_sparse_checkout_init(repo))
> +               return 1;
> +
>         remote = remote_get(option_origin);
>
>         strbuf_addf(&default_refspec, "+%s*:%s*", src_ref_prefix,
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 35ab84aabd..b7d5f15830 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -87,4 +87,17 @@ test_expect_success 'init with existing sparse-checkout' '
>         test_cmp expect dir
>  '
>
> +test_expect_success 'clone --sparse' '
> +       git clone --sparse repo clone &&
> +       git -C clone sparse-checkout list >actual &&
> +       cat >expect <<-EOF &&
> +               /*
> +               !/*/*
> +       EOF
> +       test_cmp expect actual &&
> +       ls clone >dir &&
> +       echo a >expect &&
> +       test_cmp expect dir

Checking that a toplevel entry is present, but not checking that an
entry from a subdir is missing as expected?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/9] sparse-checkout: 'add' subcommand
  2019-08-20 15:11 ` [PATCH 4/9] sparse-checkout: 'add' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-23 23:30   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-23 23:30 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The 'git sparse-checkout add' subcommand takes a list of patterns
> over stdin and writes them to the sparse-checkout file. Then, it
> updates the working directory using 'git read-tree -mu HEAD'.

As mentioned in response to the cover letter, I'd rather see it take
patterns as positional arguments (though requiring a '--' argument
before any patterns that start with a hyphen).  It could also take
--stdin to read from stdin.

> Note: if a user adds a negative pattern that would lead to the
> removal of a non-empty directory, then Git may not delete that
> directory (on Windows).

This sounds like you're re-iterating a bug mentioned earlier, but if
someone in the future comes and reads this comment it might sound like
you're saying git can avoid clearing a directory for optimization or
other reasons.  (And, of course, it'd be nice to figure out why this
bug exists.)

Another question this brings up, though, is that you worked around
this bug in 'init' so why would you not also do so for 'add'?  Seems
inconsistent to me.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt |  4 ++++
>  builtin/sparse-checkout.c             | 32 ++++++++++++++++++++++++++-
>  t/t1091-sparse-checkout-builtin.sh    | 20 +++++++++++++++++
>  3 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index 50c53ee60a..6f540a3443 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -34,6 +34,10 @@ COMMANDS
>         by Git. Add patterns to the sparse-checkout file to
>         repopulate the working directory.
>
> +'add'::
> +       Add a set of patterns to the sparse-checkout file, as given over
> +       stdin. Updates the working directory to match the new patterns.
> +
>  SPARSE CHECKOUT
>  ----------------
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 86d24e6295..ec6134fecc 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -8,7 +8,7 @@
>  #include "strbuf.h"
>
>  static char const * const builtin_sparse_checkout_usage[] = {
> -       N_("git sparse-checkout [init|list]"),
> +       N_("git sparse-checkout [init|add|list]"),
>         NULL
>  };
>
> @@ -166,6 +166,34 @@ static int sparse_checkout_init(int argc, const char **argv)
>         return sc_read_tree();
>  }
>
> +static int sparse_checkout_add(int argc, const char **argv)
> +{
> +       struct exclude_list el;
> +       char *sparse_filename;
> +       FILE *fp;
> +       struct strbuf line = STRBUF_INIT;
> +
> +       memset(&el, 0, sizeof(el));
> +
> +       sparse_filename = get_sparse_checkout_filename();
> +       add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);

el is an exclude_list and we call add_excludes_..., but it's actually
an *include* list.  This is going to cause errors at some point, and
will cause lots of headaches.

> +
> +       fp = fopen(sparse_filename, "w");
> +       write_excludes_to_file(fp, &el);
> +
> +       while (!strbuf_getline(&line, stdin)) {
> +               strbuf_trim(&line);
> +               fprintf(fp, "%s\n", line.buf);
> +       }

Should we first check whether these excludes are already in the
sparse-checkout file?

> +       fclose(fp);
> +       free(sparse_filename);
> +
> +       clear_exclude_list(&el);
> +
> +       return sc_read_tree();

What if someone calls 'git sparse-checkout add' without first calling
'git sparse-checkout init'?  As far as I can tell, core.sparseCheckout
will be unset (i.e. treated as false), meaning that this operation
will do some work, but result in no changes and a report of success.
After users try to figure out why it won't work, they eventually run
'git sparse-checkout init', which will delete all the entries they
previously added with the 'add' subcommand.

What should happen instead?

> +}
> +
>  int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  {
>         static struct option builtin_sparse_checkout_options[] = {
> @@ -187,6 +215,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>                         return sparse_checkout_list(argc, argv);
>                 if (!strcmp(argv[0], "init"))
>                         return sparse_checkout_init(argc, argv);
> +               if (!strcmp(argv[0], "add"))
> +                       return sparse_checkout_add(argc, argv);
>         }
>
>         usage_with_options(builtin_sparse_checkout_usage,
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index b7d5f15830..499bd8d6d0 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -100,4 +100,24 @@ test_expect_success 'clone --sparse' '
>         test_cmp expect dir
>  '
>
> +test_expect_success 'add to existing sparse-checkout' '
> +       echo "/folder2/*" | git -C repo sparse-checkout add &&

I've always been using '/folder2/' in sparse-checkout, without the
trailing asterisk.  That seems more friendly for cone mode too.  Are
there benefits to keeping the trailing asterisk?

> +       cat >expect <<-EOF &&
> +               /*
> +               !/*/*
> +               /folder1/*
> +               /folder2/*
> +       EOF
> +       git -C repo sparse-checkout list >actual &&
> +       test_cmp expect actual &&
> +       test_cmp expect repo/.git/info/sparse-checkout &&
> +       ls repo >dir  &&
> +       cat >expect <<-EOF &&
> +               a
> +               folder1
> +               folder2
> +       EOF
> +       test_cmp expect dir
> +'
> +
>  test_done
> \ No newline at end of file
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/9] sparse-checkout: create 'disable' subcommand
  2019-08-20 15:11 ` [PATCH 5/9] sparse-checkout: create 'disable' subcommand Derrick Stolee via GitGitGadget
@ 2019-08-23 23:50   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-23 23:50 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:14 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The instructions for disabling a sparse-checkout to a full
> working directory are complicated and non-intuitive. Add a
> subcommand, 'git sparse-checkout disable', to perform those
> steps for the user.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt | 26 +++++++---------
>  builtin/sparse-checkout.c             | 45 ++++++++++++++++++++++++---
>  t/t1091-sparse-checkout-builtin.sh    | 15 +++++++++
>  3 files changed, 67 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index 6f540a3443..de04b768ae 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -38,6 +38,10 @@ COMMANDS
>         Add a set of patterns to the sparse-checkout file, as given over
>         stdin. Updates the working directory to match the new patterns.
>
> +'disable'::
> +       Remove the sparse-checkout file, set `core.sparseCheckout` to
> +       `false`, and restore the working directory to include all files.
> +
>  SPARSE CHECKOUT
>  ----------------
>
> @@ -60,6 +64,13 @@ Then it compares the new skip-worktree value with the previous one. If
>  skip-worktree turns from set to unset, it will add the corresponding
>  file back. If it turns from unset to set, that file will be removed.
>
> +To repopulate the working directory with all files, use the
> +`git sparse-checkout disable` command.

Make sense.

> +
> +Sparse checkout support in 'git read-tree' and similar commands is
> +disabled by default. You need to set `core.sparseCheckout` to `true`
> +in order to have sparse checkout support.

But why add this paragraph?  read-tree is plumbing that we'd rather
not force on users, so I'd rather not mention it.  And I thought we
were setting core.sparseCheckout for the user.

> -Another tricky thing is fully repopulating the working directory when you
> -no longer want sparse checkout. You cannot just disable "sparse
> -checkout" because skip-worktree bits are still in the index and your working
> -directory is still sparsely populated. You should re-populate the working
> -directory with the `$GIT_DIR/info/sparse-checkout` file content as
> -follows:
> -
> -----------------
> -/*
> -----------------
> -
> -Then you can disable sparse checkout. Sparse checkout support in 'git
> -read-tree' and similar commands is disabled by default. You need to
> -set `core.sparseCheckout` to `true` in order to have sparse checkout
> -support.

Wahoo!


> -static int sc_enable_config(void)
> +static int sc_set_config(int mode)
>  {
>         struct argv_array argv = ARGV_ARRAY_INIT;
>         int result = 0;
> -       argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", "true", NULL);
> +       argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", NULL);

Remove "--add" and add "--worktree".

> +
> +       switch (mode) {
> +       case 1:
> +               argv_array_pushl(&argv, "true", NULL);
> +               break;
> +
> +       case 0:
> +               argv_array_pushl(&argv, "false", NULL);
> +               break;
> +
> +       default:
> +               die(_("invalid config mode"));
> +       }
>
>         if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {

Can't we use git_config_set instead of run_command_v_opt?

>                 error(_("failed to enable core.sparseCheckout"));
> @@ -125,7 +138,7 @@ static int sparse_checkout_init(int argc, const char **argv)
>         FILE *fp;
>         int res;
>
> -       if (sc_enable_config())
> +       if (sc_set_config(1))

Random idea: The code would be more self-documenting, if you just made
sc_set_config take a string and you passed either "true" or "false".
Then you could also dispense with the switch statement.  Yes, you can
make typos, but there are only going to be 2-3 places in the code that
call it, right?

>                 return 1;
>
>         memset(&el, 0, sizeof(el));
> @@ -194,6 +207,28 @@ static int sparse_checkout_add(int argc, const char **argv)
>         return sc_read_tree();
>  }
>
> +static int sparse_checkout_disable(int argc, const char **argv)
> +{
> +       char *sparse_filename;
> +       FILE *fp;
> +
> +       if (sc_set_config(1))
> +               die(_("failed to change config"));

Ooh, preventative setting.  This'll probably save some grief for
people who get themselves in a weird state and start mucking around
manually; very nice.

> +
> +       sparse_filename = get_sparse_checkout_filename();
> +       fp = fopen(sparse_filename, "w");
> +       fprintf(fp, "/*\n");
> +       fclose(fp);
> +
> +       if (sc_read_tree())
> +               die(_("error while refreshing working directory"));

This can fail e.g. if there are conflicts in the index.  Luckily, if
they fix them up and then re-run "git sparse-checkout disable" then it
can work.  But you've already changed the
$GIT_DIR/info/sparse-checkout file to a state that won't match what
they actually have checked out.  Will that cause any problems?  (I
can't think of any immediately, but I am curious.)

Also, should or can we give a better error message, such as
instructing them to try again after they've fixed any problems
reported by the sc_read_tree() operation?

> +
> +       unlink(sparse_filename);
> +       free(sparse_filename);
> +
> +       return sc_set_config(0);
> +}
> +
>  int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  {
>         static struct option builtin_sparse_checkout_options[] = {
> @@ -217,6 +252,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>                         return sparse_checkout_init(argc, argv);
>                 if (!strcmp(argv[0], "add"))
>                         return sparse_checkout_add(argc, argv);
> +               if (!strcmp(argv[0], "disable"))
> +                       return sparse_checkout_disable(argc, argv);
>         }
>
>         usage_with_options(builtin_sparse_checkout_usage,
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 499bd8d6d0..68ca63a6f6 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -120,4 +120,19 @@ test_expect_success 'add to existing sparse-checkout' '
>         test_cmp expect dir
>  '
>
> +test_expect_success 'sparse-checkout disable' '
> +       git -C repo sparse-checkout disable &&
> +       test_path_is_missing repo/.git/info/sparse-checkout &&
> +       git -C repo config --list >config &&

Should this command also have a --worktree flag (just for consistency,
even if we know there will only be one worktree so it doesn't matter
in practice)?  There are probably similar cases I overlooked on the
previous patches.

> +       test_i18ngrep "core.sparsecheckout=false" config &&
> +       ls repo >dir &&
> +       cat >expect <<-EOF &&
> +               a
> +               deep
> +               folder1
> +               folder2
> +       EOF
> +       test_cmp expect dir
> +'
> +
>  test_done
> \ No newline at end of file
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/9] trace2:experiment: clear_ce_flags_1
  2019-08-20 15:11 ` [PATCH 6/9] trace2:experiment: clear_ce_flags_1 Jeff Hostetler via GitGitGadget
@ 2019-08-24  0:08   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-24  0:08 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Jeff Hostetler

On Tue, Aug 20, 2019 at 8:12 AM Jeff Hostetler via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Jeff Hostetler <jeffhost@microsoft.com>

Can the commit summary be turned into English?

> The clear_ce_flags_1 method is used by many types of calls to
> unpack_trees(). Add trace2 regions around the method, including
> some flag information, so we can get granular performance data
> during experiments.

It might be nice to have some words in the cover letter about why this
patch is included in this series instead of being a separate
submission.  I'm not familiar with the trace2 stuff yet; this looks
probably useful, but the commit message makes it sound like something
general rather than specific to this series.

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
<snip>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] sparse-checkout: add 'cone' mode
  2019-08-20 15:11 ` [PATCH 7/9] sparse-checkout: add 'cone' mode Derrick Stolee via GitGitGadget
@ 2019-08-24  0:31   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-24  0:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:13 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The sparse-checkout feature can have quadratic performance as
> the number of patterns and number of entries in the index grow.
> If there are 1,000 patterns and 1,000,000 entries, this time can
> be very significant.
>
> Create a new 'cone' mode for the core.sparseCheckout config
> option, and adjust the parser to set an appropriate enum value.
>
> While adjusting the type of this variable, rename it from
> core_apply_sparse_checkout to core_sparse_checkout. This will
> help avoid parallel changes from hitting type issues, and we
> can guarantee that all uses now consider the enum values instead
> of the int value.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config/core.txt         |  7 ++--
>  Documentation/git-sparse-checkout.txt | 50 +++++++++++++++++++++++++++
>  builtin/clone.c                       |  2 +-
>  builtin/sparse-checkout.c             | 16 +++++----
>  cache.h                               |  8 ++++-
>  config.c                              | 10 +++++-
>  environment.c                         |  2 +-
>  t/t1091-sparse-checkout-builtin.sh    | 14 ++++++++
>  unpack-trees.c                        |  2 +-
>  9 files changed, 98 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 75538d27e7..9b8ab2a6d4 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -591,8 +591,11 @@ core.multiPackIndex::
>         multi-pack-index design document].
>
>  core.sparseCheckout::
> -       Enable "sparse checkout" feature. See section "Sparse checkout" in
> -       linkgit:git-read-tree[1] for more information.
> +       Enable "sparse checkout" feature. If "false", then sparse-checkout
> +       is disabled. If "true", then sparse-checkout is enabled with the full
> +       .gitignore pattern set. If "cone", then sparse-checkout is enabled with
> +       a restricted pattern set. See linkgit:git-sparse-checkout[1] for more
> +       information.
>
>  core.abbrev::
>         Set the length object names are abbreviated to.  If
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index de04b768ae..463319055b 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -86,6 +86,56 @@ negate patterns. For example, to remove the file `unwanted`:
>  ----------------
>
>
> +## CONE PATTERN SET
> +
> +The full pattern set allows for arbitrary pattern matches and complicated
> +inclusion/exclusion rules. These can result in O(N*M) pattern matches when
> +updating the index, where N is the number of patterns and M is the number
> +of paths in the index. To combat this performance issue, a more restricted
> +pattern set is allowed when `core.spareCheckout` is set to `cone`.
> +
> +The accepted patterns in the cone pattern set are:
> +
> +1. *Recursive:* All paths inside a directory are included.
> +
> +2. *Parent:* All files immediately inside a directory are included.
> +
> +In addition to the above two patterns, we also expect that all files in the
> +root directory are included. If a recursive pattern is added, then all
> +leading directories are added as parent patterns.
> +
> +By default, when running `git sparse-checkout init`, the root directory is
> +added as a parent pattern. At this point, the sparse-checkout file contains
> +the following patterns:
> +
> +```
> +/*
> +!/*/*
> +```
> +
> +This says "include everything in root, but nothing two levels below root."

...but nothing at the level below root...?

> +If we then add the folder `A/B/C` as a recursive pattern, the folders `A` and
> +`A/B` are added as parent patterns. The resulting sparse-checkout file is
> +now
> +
> +```
> +/*
> +!/*/*
> +/A/*
> +!/A/*/*
> +/A/B/*
> +!/A/B/*/*
> +/A/B/C/*
> +```

Can we dispense with the trailing asterisks (other than on the first
line for the root level)?  This reads a lot cleaner to me:

```
/*
!/*/
/A/
!/A/*/
/A/B/
!/A/B/*/
/A/B/C/
```

We could also dispense with the trailing '/' on the inclusion lines
from this version, but I'm not sure that helps.

> +
> +Here, order matters, so the negative patterns are overridden by the positive
> +patterns that appear lower in the file.
> +
> +If `core.sparseCheckout=cone`, then Git will parse the sparse-checkout file
> +expecting patterns of these types. Git will warn if the patterns do not match.
> +If the patterns do match the expected format, then Git will use faster hash-
> +based algorithms to compute inclusion in the sparse-checkout.
> +
>  SEE ALSO
>  --------
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index d6d49a73ff..763898ada5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -747,7 +747,7 @@ static int git_sparse_checkout_init(const char *repo)
>          * We must apply the setting in the current process
>          * for the later checkout to use the sparse-checkout file.
>          */
> -       core_apply_sparse_checkout = 1;
> +       core_sparse_checkout = SPARSE_CHECKOUT_FULL;
>
>         if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
>                 error(_("failed to initialize sparse-checkout"));
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 8f97c27ec7..77e5235720 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -79,18 +79,22 @@ static int sc_read_tree(void)
>         return result;
>  }
>
> -static int sc_set_config(int mode)
> +static int sc_set_config(enum sparse_checkout_mode mode)
>  {
>         struct argv_array argv = ARGV_ARRAY_INIT;
>         int result = 0;
>         argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", NULL);
>
>         switch (mode) {
> -       case 1:
> +       case SPARSE_CHECKOUT_FULL:
>                 argv_array_pushl(&argv, "true", NULL);
>                 break;
>
> -       case 0:
> +       case SPARSE_CHECKOUT_CONE:
> +               argv_array_pushl(&argv, "cone", NULL);
> +               break;
> +
> +       case SPARSE_CHECKOUT_NONE:
>                 argv_array_pushl(&argv, "false", NULL);
>                 break;
>
> @@ -138,7 +142,7 @@ static int sparse_checkout_init(int argc, const char **argv)
>         FILE *fp;
>         int res;
>
> -       if (sc_set_config(1))
> +       if (sc_set_config(SPARSE_CHECKOUT_FULL))

Going back to my comment on the previous patch, perhaps
SPARSE_CHECKOUT_FULL could be the string "true", so you can avoid the
switch statement?

>                 return 1;
>
>         memset(&el, 0, sizeof(el));
> @@ -212,7 +216,7 @@ static int sparse_checkout_disable(int argc, const char **argv)
>         char *sparse_filename;
>         FILE *fp;
>
> -       if (sc_set_config(1))
> +       if (sc_set_config(SPARSE_CHECKOUT_FULL))
>                 die(_("failed to change config"));
>
>         sparse_filename = get_sparse_checkout_filename();
> @@ -226,7 +230,7 @@ static int sparse_checkout_disable(int argc, const char **argv)
>         unlink(sparse_filename);
>         free(sparse_filename);
>
> -       return sc_set_config(0);
> +       return sc_set_config(SPARSE_CHECKOUT_NONE);
>  }
>
>  int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
> diff --git a/cache.h b/cache.h
> index b1da1ab08f..4426816ca1 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -865,12 +865,18 @@ extern char *git_replace_ref_base;
>
>  extern int fsync_object_files;
>  extern int core_preload_index;
> -extern int core_apply_sparse_checkout;
>  extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
>  extern const char *core_fsmonitor;
>
> +enum sparse_checkout_mode {
> +       SPARSE_CHECKOUT_NONE = 0,
> +       SPARSE_CHECKOUT_FULL = 1,
> +       SPARSE_CHECKOUT_CONE = 2,
> +};
> +enum sparse_checkout_mode core_sparse_checkout;
> +
>  /*
>   * Include broken refs in all ref iterations, which will
>   * generally choke dangerous operations rather than letting

Wait, you're not changing the add command?  So the cone mode just

> diff --git a/config.c b/config.c
> index 3900e4947b..15b7a20dd9 100644
> --- a/config.c
> +++ b/config.c
> @@ -1360,7 +1360,15 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>         }
>
>         if (!strcmp(var, "core.sparsecheckout")) {
> -               core_apply_sparse_checkout = git_config_bool(var, value);
> +               int result = git_parse_maybe_bool(value);
> +
> +               if (result < 0) {
> +                       core_sparse_checkout = SPARSE_CHECKOUT_NONE;
> +
> +                       if (!strcasecmp(value, "cone"))
> +                               core_sparse_checkout = SPARSE_CHECKOUT_CONE;
> +               } else
> +                       core_sparse_checkout = result;
>                 return 0;
>         }
>
> diff --git a/environment.c b/environment.c
> index 89af47cb85..cc12e30bd6 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -68,7 +68,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
>  enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
>  char *notes_ref_name;
>  int grafts_replace_parents = 1;
> -int core_apply_sparse_checkout;
> +enum sparse_checkout_mode core_sparse_checkout;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 68ca63a6f6..8cc377b839 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -120,6 +120,20 @@ test_expect_success 'add to existing sparse-checkout' '
>         test_cmp expect dir
>  '
>
> +test_expect_success 'cone mode: match patterns' '
> +       git -C repo config --replace-all core.sparseCheckout cone &&

--replace-all?  This makes me wonder if you were actually doing the
--add in the previous patchsets on purpose.  I'm so confused.

> +       rm -rf repo/a repo/folder1 repo/folder2 &&
> +       git -C repo read-tree -mu HEAD &&
> +       git -C repo reset --hard &&
> +       ls repo >dir  &&
> +       cat >expect <<-EOF &&
> +               a
> +               folder1
> +               folder2
> +       EOF
> +       test_cmp expect dir
> +'
> +
>  test_expect_success 'sparse-checkout disable' '
>         git -C repo sparse-checkout disable &&
>         test_path_is_missing repo/.git/info/sparse-checkout &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 8c3b5e8849..289c62305f 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1468,7 +1468,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>
>         trace_performance_enter();
>         memset(&el, 0, sizeof(el));
> -       if (!core_apply_sparse_checkout || !o->update)
> +       if (!core_sparse_checkout || !o->update)
>                 o->skip_sparse_checkout = 1;
>         if (!o->skip_sparse_checkout) {
>                 char *sparse = git_pathdup("info/sparse-checkout");
> --
> gitgitgadget

Wait...I didn't see anything checking the value of "cone" and using
it, it only has an ability to set it.  What's the point?  Or is that
going to come in a later patch?  (If it does, should the commit
message mention that?)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/9] sparse-checkout: use hashmaps for cone patterns
  2019-08-20 15:11 ` [PATCH 8/9] sparse-checkout: use hashmaps for cone patterns Derrick Stolee via GitGitGadget
@ 2019-08-24  4:56   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-24  4:56 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The parent and recursive patterns allowed by the "cone mode"
> option in sparse-checkout are restrictive enough that we
> can avoid using the regex parsing. Everything is based on
> prefix matches, so we can use hashsets to store the prefixes
> from the sparse-checkout file. When checking a path, we can
> strip path entries from the path and check the hashset for
> an exact match.
>
> As a test, I created a cone-mode sparse-checkout file for the
> Linux repository that actually includes every file. This was
> constructed by taking every folder in the Linux repo and creating
> the pattern pairs here:
>
>         /$folder/*
>         !/$folder/*/*
>
> This resulted in a sparse-checkout file sith 8,296 patterns.
> Running 'git read-tree -mu HEAD' on this file had the following
> performance:
>
>         core.sparseCheckout=false: 0.21 s (0.00 s)
>          core.sparseCheckout=true: 3.75 s (3.50 s)
>          core.sparseCheckout=cone: 0.23 s (0.01 s)

Nice!

> The times in parentheses above correspond to the time spent
> in the first clear_ce_flags() call, according to the trace2
> performance traces.
>
> While this example is contrived, it demonstrates how these
> patterns can slow the sparse-checkout feature.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt |   1 -
>  dir.c                                 | 154 +++++++++++++++++++++++++-
>  dir.h                                 |  27 +++++
>  t/t1091-sparse-checkout-builtin.sh    |   8 ++
>  4 files changed, 183 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index 463319055b..7ade827370 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -85,7 +85,6 @@ negate patterns. For example, to remove the file `unwanted`:
>  !unwanted
>  ----------------
>
> -
>  ## CONE PATTERN SET
>
>  The full pattern set allows for arbitrary pattern matches and complicated
> diff --git a/dir.c b/dir.c
> index d021c908e5..2c5ff89a72 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -599,6 +599,99 @@ void parse_exclude_pattern(const char **pattern,
>         *patternlen = len;
>  }
>
> +static int el_hashmap_cmp(const void *unused_cmp_data,
> +                         const void *a, const void *b, const void *key)
> +{
> +       const struct exclude_entry *ee1 = a;
> +       const struct exclude_entry *ee2 = b;
> +
> +       return strncmp(ee1->pattern, ee2->pattern, ee1->patternlen);
> +}
> +
> +static void add_exclude_to_hashsets(struct exclude_list *el, struct exclude *x)

Is this for something that is logically an "include" or an "exclude"?
The earlier use of "exclude" for "include"s now how me totally confused
and makes me know I'll have to read the code really carefully and may
still get messed up.

I don't know if there's a better way, but re-using .gitignore internals
for sparse-checkout stuff leads to this inverted switcheroo and makes
things hard to follow.  Is it too late or too hard to easily rename the
"exclude" stuff used by gitignore to something else more neutral?
Sigh...

> +{
> +       struct exclude_entry *e;
> +       char *truncated;
> +       char *data = NULL;
> +
> +       if (!el->use_cone_patterns)
> +               return;
> +
> +       if (x->patternlen >= 4 &&
> +           !strcmp(x->pattern + x->patternlen - 4, "/*/*")) {
> +               if (!(x->flags & EXC_FLAG_NEGATIVE)) {
> +                       /* Not a cone pattern. */
> +                       el->use_cone_patterns = 0;
> +                       warning(_("unrecognized pattern: '%s'"), x->pattern);
> +                       goto clear_hashmaps;
> +               }
> +
> +               truncated = xstrdup(x->pattern);
> +               truncated[x->patternlen - 4] = 0;
> +
> +               e = xmalloc(sizeof(struct exclude_entry));
> +               e->pattern = truncated;
> +               e->patternlen = x->patternlen - 4;
> +               hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
> +
> +               if (!hashmap_get(&el->recursive_hashmap, e, NULL)) {
> +                       /* We did not see the "parent" included */
> +                       warning(_("unrecognized negative pattern: '%s'"), x->pattern);
> +                       free(truncated);
> +                       goto clear_hashmaps;
> +               }
> +
> +               hashmap_add(&el->parent_hashmap, e);
> +               hashmap_remove(&el->recursive_hashmap, e, &data);
> +               free(data);
> +               return;
> +       }
> +
> +       if (x->patternlen >= 2 &&
> +           !strcmp(x->pattern + x->patternlen - 2, "/*")) {
> +               if (x->flags & EXC_FLAG_NEGATIVE) {
> +                       warning(_("unrecognized negative pattern: '%s'"), x->pattern);
> +                       goto clear_hashmaps;
> +               }
> +
> +               e = xmalloc(sizeof(struct exclude_entry));
> +
> +               truncated = xstrdup(x->pattern);
> +               truncated[x->patternlen - 2] = 0;
> +               e->pattern = truncated;
> +               e->patternlen = x->patternlen - 2;
> +               hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
> +
> +               hashmap_add(&el->recursive_hashmap, e);
> +
> +               if (hashmap_get(&el->parent_hashmap, e, NULL)) {
> +                       /* we already included this at the parent level */
> +                       warning(_("your sparse-checkout file may have issues: pattern '%s' is repeated"),
> +                               x->pattern);
> +                       hashmap_remove(&el->parent_hashmap, e, &data);
> +                       free(data);
> +               }
> +               return;
> +       }
> +
> +clear_hashmaps:
> +       hashmap_free(&el->parent_hashmap, 1);
> +       hashmap_free(&el->recursive_hashmap, 1);
> +       el->use_cone_patterns = 0;
> +}
> +
> +static int hashmap_contains_path(struct hashmap *map,
> +                                struct strbuf *pattern)
> +{
> +       struct exclude_entry e;
> +
> +       /* Check straight mapping */
> +       e.pattern = pattern->buf;
> +       e.patternlen = pattern->len;
> +       hashmap_entry_init(&e, memhash(e.pattern, e.patternlen));
> +       return !!hashmap_get(map, &e, NULL);
> +}
> +
>  void add_exclude(const char *string, const char *base,
>                  int baselen, struct exclude_list *el, int srcpos)
>  {
> @@ -623,6 +716,8 @@ void add_exclude(const char *string, const char *base,
>         ALLOC_GROW(el->excludes, el->nr + 1, el->alloc);
>         el->excludes[el->nr++] = x;
>         x->el = el;
> +
> +       add_exclude_to_hashsets(el, x);
>  }
>
>  static int read_skip_worktree_file_from_index(const struct index_state *istate,
> @@ -848,6 +943,10 @@ static int add_excludes_from_buffer(char *buf, size_t size,
>         int i, lineno = 1;
>         char *entry;
>
> +       el->use_cone_patterns = core_sparse_checkout == SPARSE_CHECKOUT_CONE ? 1 : 0;
> +       hashmap_init(&el->recursive_hashmap, el_hashmap_cmp, NULL, 0);
> +       hashmap_init(&el->parent_hashmap, el_hashmap_cmp, NULL, 0);
> +
>         el->filebuf = buf;
>
>         if (skip_utf8_bom(&buf, size))
> @@ -1070,18 +1169,61 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
>
>  /*
>   * Scan the list and let the last match determine the fate.
> - * Return 1 for exclude, 0 for include and -1 for undecided.
> + * Return 0 for exclude, 1 for include and -1 for undecided.

Um...this doesn't make me feel any better about whether we're going to
run into bugs about "include" vs. "exclude".

>   */
>  int is_excluded_from_list(const char *pathname,
>                           int pathlen, const char *basename, int *dtype,
>                           struct exclude_list *el, struct index_state *istate)
>  {
>         struct exclude *exclude;
> -       exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
> -                                                 dtype, el, istate);
> -       if (exclude)
> -               return exclude->flags & EXC_FLAG_NEGATIVE ? 0 : 1;
> -       return -1; /* undecided */
> +       struct strbuf parent_pathname = STRBUF_INIT;
> +       int result = 0;
> +       const char *slash_pos;
> +
> +       if (!el->use_cone_patterns) {
> +               exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
> +                                                               dtype, el, istate);
> +
> +               if (exclude)
> +                       return exclude->flags & EXC_FLAG_NEGATIVE ? 0 : 1;
> +
> +               return -1; /* undecided */
> +       }
> +
> +       strbuf_addch(&parent_pathname, '/');
> +       strbuf_add(&parent_pathname, pathname, pathlen);
> +       slash_pos = strrchr(parent_pathname.buf, '/');
> +
> +       if (slash_pos == parent_pathname.buf) {
> +               /* include every file in root */
> +               result = 1;
> +               goto done;
> +       }
> +
> +       strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
> +
> +       if (hashmap_contains_path(&el->parent_hashmap, &parent_pathname)) {
> +               result = 1;
> +               goto done;
> +       }
> +
> +       while (parent_pathname.len) {
> +               if (hashmap_contains_path(&el->recursive_hashmap,
> +                                         &parent_pathname)) {
> +                       result = -1;
> +                       goto done;
> +               }
> +
> +               slash_pos = strrchr(parent_pathname.buf, '/');
> +               if (slash_pos == parent_pathname.buf)
> +                       break;
> +
> +               strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
> +       }
> +
> +done:
> +       strbuf_release(&parent_pathname);
> +       return result;
>  }
>
>  static struct exclude *last_exclude_matching_from_lists(struct dir_struct *dir,
> diff --git a/dir.h b/dir.h
> index 680079bbe3..2d3356d1c0 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -4,6 +4,7 @@
>  /* See Documentation/technical/api-directory-listing.txt */
>
>  #include "cache.h"
> +#include "hashmap.h"
>  #include "strbuf.h"
>
>  struct dir_entry {
> @@ -37,6 +38,13 @@ struct exclude {
>         int srcpos;
>  };
>
> +/* used for hashmaps for cone patterns */
> +struct exclude_entry {
> +       struct hashmap_entry ent;
> +       char *pattern;
> +       size_t patternlen;
> +};
> +
>  /*
>   * Each excludes file will be parsed into a fresh exclude_list which
>   * is appended to the relevant exclude_list_group (either EXC_DIRS or
> @@ -55,6 +63,25 @@ struct exclude_list {
>         const char *src;
>
>         struct exclude **excludes;
> +
> +       /*
> +        * While scanning the excludes, we attempt to match the patterns
> +        * with a more restricted set that allows us to use hashsets for
> +        * matching logic, which is faster than the linear lookup in the
> +        * excludes array above. If non-zero, that check succeeded.
> +        */
> +       unsigned use_cone_patterns;
> +
> +       /*
> +        * Stores paths where everything starting with those paths
> +        * is included.
> +        */
> +       struct hashmap recursive_hashmap;
> +
> +       /*
> +        * Used to check single-level parents of blobs.
> +        */
> +       struct hashmap parent_hashmap;
>  };
>
>  /*
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 8cc377b839..60f10864a1 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -134,6 +134,14 @@ test_expect_success 'cone mode: match patterns' '
>         test_cmp expect dir
>  '
>
> +test_expect_success 'cone mode: warn on bad pattern' '
> +       test_when_finished mv sparse-checkout repo/.git/info &&
> +       cp repo/.git/info/sparse-checkout . &&
> +       echo "!/deep/deeper/*" >>repo/.git/info/sparse-checkout &&
> +       git -C repo read-tree -mu HEAD 2>err &&
> +       test_i18ngrep "unrecognized negative pattern" err
> +'
> +
>  test_expect_success 'sparse-checkout disable' '
>         git -C repo sparse-checkout disable &&
>         test_path_is_missing repo/.git/info/sparse-checkout &&
> --
> gitgitgadget

So, uh, I saw the exclude vs. include thing, started scanning to see
if it was going to get better, saw the next include/exclude thing that
triggered a bell, and I more or less just scrolled quickly to the
bottom checking if anything might catch my eye.  So I didn't look at
this patch very closely at all.  I decided to just start playing with
the series instead...and as far as that goes, this patch is probably
fine because my testing seemed to work.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] sparse-checkout: init and add in cone mode
  2019-08-20 15:11 ` [PATCH 9/9] sparse-checkout: init and add in cone mode Derrick Stolee via GitGitGadget
@ 2019-08-24  5:07   ` Elijah Newren
  0 siblings, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-24  5:07 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>

Perhaps "sparse-checkout: modify 'init' and 'add' for cone mode" for
the summary?

> From: Derrick Stolee <dstolee@microsoft.com>
>
> To make the cone pattern set easy to use, update the behavior of
> 'git sparse-checkout [init|add]'.

Maybe switch these phrases around (starting with "Update", ending with "use.")

> Add '--cone' flag to 'git sparse-checkout init' to set the config
> option 'core.sparseCheckout=cone'.
>
> When running 'git sparse-checkout add' in cone mode, a user only
> needs to supply a list of recursive folder matches. Git will
> automatically add the necessary parent matches for the leading
> directories.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/sparse-checkout.c          | 134 +++++++++++++++++++++++++++--
>  t/t1091-sparse-checkout-builtin.sh |  35 ++++++++
>  2 files changed, 164 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 77e5235720..0a4e101ddd 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -6,15 +6,22 @@
>  #include "repository.h"
>  #include "run-command.h"
>  #include "strbuf.h"
> +#include "string-list.h"
>
>  static char const * const builtin_sparse_checkout_usage[] = {
>         N_("git sparse-checkout [init|add|list|disable]"),
>         NULL
>  };
>
> +static const char * const builtin_sparse_checkout_init_usage[] = {
> +       N_("git sparse-checkout init [--cone]"),
> +       NULL
> +};
> +
>  struct opts_sparse_checkout {
>         const char *subcommand;
>         int read_stdin;
> +       int cone;
>  } opts;
>
>  static char *get_sparse_checkout_filename(void)
> @@ -41,6 +48,60 @@ static void write_excludes_to_file(FILE *fp, struct exclude_list *el)
>         }
>  }
>
> +static void write_cone_to_file(FILE *fp, struct exclude_list *el)
> +{
> +       int i;
> +       struct exclude_entry *entry;
> +       struct hashmap_iter iter;
> +       struct string_list sl = STRING_LIST_INIT_DUP;
> +
> +       hashmap_iter_init(&el->parent_hashmap, &iter);
> +       while ((entry = hashmap_iter_next(&iter))) {
> +               char *pattern = xstrdup(entry->pattern);
> +               char *converted = pattern;
> +               if (pattern[0] == '/')
> +                       converted++;
> +               if (pattern[entry->patternlen - 1] == '/')
> +                       pattern[entry->patternlen - 1] = 0;
> +               string_list_insert(&sl, converted);
> +               free(pattern);
> +       }
> +
> +       string_list_sort(&sl);

I was worried that if someone had a directory named '(parenthetical)'
or '%looks_like_a_comment' that they wanted to include in cone mode,
that since '*' sorts after '(' and '%' that entries would appear in
sparse-checkout in the wrong order.  But you are comparing on the
directory name, not the full line that will be written to the
sparse-checkout file (i.e. without the trailing '/*'), so my worries
were unfounded and there's no actual problem here.

> +       string_list_remove_duplicates(&sl, 0);
> +
> +       for (i = 0; i < sl.nr; i++) {
> +               char *pattern = sl.items[i].string;
> +
> +               if (!strcmp(pattern, ""))
> +                       fprintf(fp, "/*\n!/*/*\n");
> +               else
> +                       fprintf(fp, "/%s/*\n!/%s/*/*\n", pattern, pattern);
> +       }
> +
> +       string_list_clear(&sl, 0);
> +
> +       hashmap_iter_init(&el->recursive_hashmap, &iter);
> +       while ((entry = hashmap_iter_next(&iter))) {
> +               char *pattern = xstrdup(entry->pattern);
> +               char *converted = pattern;
> +               if (pattern[0] == '/')
> +                       converted++;
> +               if (pattern[entry->patternlen - 1] == '/')
> +                       pattern[entry->patternlen - 1] = 0;
> +               string_list_insert(&sl, converted);
> +               free(pattern);
> +       }
> +
> +       string_list_sort(&sl);
> +       string_list_remove_duplicates(&sl, 0);
> +
> +       for (i = 0; i < sl.nr; i++) {
> +               char *pattern = sl.items[i].string;
> +               fprintf(fp, "/%s/*\n", pattern);
> +       }
> +}
> +
>  static int sparse_checkout_list(int argc, const char **argv)

Should the list mode in cone mode be modified to just show the
directories the user added?  It seems a little weird to show the
internal details of the implementation (all the parent directories and
negated entries and whatnot).  That's also not in a form that users
can pass along to future `sparse-checkout add` invocations.

In fact, maybe the implementation should be changed?  Perhaps we
should write out a .git/info/sparse-checkout-cone file instead with
just the wanted directories (with a trailing slash to permit future
addition of individual paths without bringing parents along)?  It'd be
easier on the parsing, and older git doesn't work with cone mode
anyway:

$ /usr/bin/git status
fatal: bad numeric config value 'cone' for 'core.sparsecheckout': invalid unit

Or, if we need to make old git work reasonably with cone mode, we
could write *both* .git/info/sparse-checkout-cone and
.git/info/sparse-checkout, but the latter begins with a comment
something akin to:

# DO NOT EDIT; auto-generated by 'git sparse-checkout' command and
only used by older git clients; see .git/info/sparse-checkout-clone
for active settings

>  {
>         struct exclude_list el;
> @@ -141,8 +202,21 @@ static int sparse_checkout_init(int argc, const char **argv)
>         char *sparse_filename;
>         FILE *fp;
>         int res;
> +       enum sparse_checkout_mode mode;
>
> -       if (sc_set_config(SPARSE_CHECKOUT_FULL))
> +       static struct option builtin_sparse_checkout_init_options[] = {
> +               OPT_BOOL(0, "cone", &opts.cone,
> +                        N_("initialize the sparse-checkout in cone mode")),
> +               OPT_END(),
> +       };
> +
> +       argc = parse_options(argc, argv, NULL,
> +                            builtin_sparse_checkout_init_options,
> +                            builtin_sparse_checkout_init_usage, 0);
> +
> +       mode = opts.cone ? SPARSE_CHECKOUT_CONE : SPARSE_CHECKOUT_FULL;
> +
> +       if (sc_set_config(mode))
>                 return 1;
>
>         memset(&el, 0, sizeof(el));
> @@ -183,6 +257,34 @@ static int sparse_checkout_init(int argc, const char **argv)
>         return sc_read_tree();
>  }
>
> +static void insert_recursive_pattern(struct exclude_list *el, struct strbuf *path)
> +{
> +       struct exclude_entry *e = xmalloc(sizeof(struct exclude_entry));
> +       e->patternlen = path->len;
> +       e->pattern = strbuf_detach(path, NULL);
> +       hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
> +
> +       hashmap_add(&el->recursive_hashmap, e);
> +
> +       while (e->patternlen) {
> +               char *slash = strrchr(e->pattern, '/');
> +               char *oldpattern = e->pattern;
> +               size_t newlen;
> +
> +               if (!slash)
> +                       break;
> +
> +               newlen = slash - e->pattern;
> +               e = xmalloc(sizeof(struct exclude_entry));
> +               e->patternlen = newlen;
> +               e->pattern = xstrndup(oldpattern, newlen);
> +               hashmap_entry_init(e, memhash(e->pattern, e->patternlen));
> +
> +               if (!hashmap_get(&el->parent_hashmap, e, NULL))
> +                       hashmap_add(&el->parent_hashmap, e);
> +       }
> +}
> +
>  static int sparse_checkout_add(int argc, const char **argv)
>  {
>         struct exclude_list el;
> @@ -196,11 +298,33 @@ static int sparse_checkout_add(int argc, const char **argv)
>         add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);
>
>         fp = fopen(sparse_filename, "w");
> -       write_excludes_to_file(fp, &el);
>
> -       while (!strbuf_getline(&line, stdin)) {
> -               strbuf_trim(&line);
> -               fprintf(fp, "%s\n", line.buf);
> +       if (core_sparse_checkout == SPARSE_CHECKOUT_FULL) {
> +               write_excludes_to_file(fp, &el);
> +
> +               while (!strbuf_getline(&line, stdin)) {
> +                       strbuf_trim(&line);
> +                       fprintf(fp, "%s\n", line.buf);
> +               }
> +       } else if (core_sparse_checkout == SPARSE_CHECKOUT_CONE) {
> +               while (!strbuf_getline(&line, stdin)) {
> +                       strbuf_trim(&line);
> +
> +                       strbuf_trim_trailing_dir_sep(&line);
> +
> +                       if (!line.len)
> +                               continue;
> +
> +                       if (line.buf[0] == '/')
> +                               strbuf_remove(&line, 0, 1);
> +
> +                       if (!line.len)
> +                               continue;
> +
> +                       insert_recursive_pattern(&el, &line);
> +               }
> +
> +               write_cone_to_file(fp, &el);
>         }

No sanity checking that the user provided directory names rather than
globs or regexes?  Won't that silently do nothing, suggesting we need
to warn users?  Or are we just assuming that any 'glob' or 'regex'
characters they provide are actually meant as literal characters in a
pathname?

Also, what if they try to add an exclusion rule for a directory to
undo their add?  e.g.
  echo "!some/subdir/" | git sparse-checkout add
?

Is there a way to undo an add without undoing everything?  (Perhaps if
'git sparse-checkout list' showed just the added directories, it'd be
easy for users to take that output, delete one they don't want, and
then deal with undo and redoing.  Is that how we should handle it?)

>
>         fclose(fp);
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 60f10864a1..3412bafdff 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -157,4 +157,39 @@ test_expect_success 'sparse-checkout disable' '
>         test_cmp expect dir
>  '
>
> +test_expect_success 'cone mode: init and add' '
> +       git -C repo sparse-checkout init --cone &&
> +       git -C repo config --list >config &&
> +       test_i18ngrep "core.sparsecheckout=cone" config &&
> +       ls repo >dir  &&
> +       echo a >expect &&
> +       test_cmp expect dir &&
> +       echo deep/deeper1/deepest | git -C repo sparse-checkout add &&
> +       ls repo >dir  &&
> +       cat >expect <<-EOF &&
> +               a
> +               deep
> +       EOF
> +       ls repo/deep >dir  &&
> +       cat >expect <<-EOF &&
> +               a
> +               deeper1
> +       EOF
> +       ls repo/deep/deeper1 >dir  &&
> +       cat >expect <<-EOF &&
> +               a
> +               deepest
> +       EOF
> +       test_cmp expect dir &&
> +       cat >expect <<-EOF &&
> +               /*
> +               !/*/*
> +               /deep/*
> +               !/deep/*/*
> +               /deep/deeper1/*
> +               !/deep/deeper1/*/*
> +               /deep/deeper1/deepest/*
> +       EOF
> +       test_cmp expect repo/.git/info/sparse-checkout
> +'
>  test_done
> \ No newline at end of file
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-22 13:10   ` Derrick Stolee
  2019-08-22 14:25     ` Derrick Stolee
@ 2019-08-24  5:40     ` Elijah Newren
  2019-08-26 13:29       ` Derrick Stolee
  2019-09-02 17:55       ` Eric Sunshine
  1 sibling, 2 replies; 29+ messages in thread
From: Elijah Newren @ 2019-08-24  5:40 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano

On Thu, Aug 22, 2019 at 6:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/21/2019 5:52 PM, Elijah Newren wrote:
> > On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:

<snip>
> >> Here are some more specific details:
> >>
> >>  * git sparse-checkout init enables core.sparseCheckout and populates the
> >>    sparse-checkout file with patterns that match only the files at root.
> >
> > Does it enable core.sparseCheckout in the current worktree, or for all
> > worktrees?  Do we require extensions.worktreeConfig to be set to true
> > first?  If we don't require extensions.worktreeConfig to be set to
> > true, and users add worktrees later, do they encounter negative
> > surprises (immediately or later)?
>
> This is an interesting scenario that I had not considered. Thanks!
>
> My guess is that we should set `extensions.worktreeConfig=true` to
> avoid surprises. I'll need to play with this to discover the answers
> to these questions:
>
> 1. Where does the worktree look for the sparse-checkout file? Does
>    each worktree have its own sparse-checkout file? Should it?

For the main/first/primary worktree: .git/info/sparse-checkout
For all other worktrees: .git/worktrees/$WORKTREE/info/sparse-checkout

So, yes, each has its own, and from my viewpoint, absolutely yes that
is what we want.

> 2. If I have `extensions.worktreeConfig=true` and `core.sparseCheckout=true`
>    in the current worktree and run `git worktree add`, does the new worktree
>    have `core.sparseCheckout=true`? Can we `git clone --sparse` and then
>    start building sparse worktrees seamlessly?

My $0.02: I think `git worktree add` should not only adopt the setting
of core.sparseCheckout from the current worktree, but it should also
adopt the $GIT_DIR/info/sparse-checkout file too.  Granted, users can
change it to something else, but much like a new shell starts up with
the same current working directory as its parent shell, I think it'd
be most obvious for people to have a worktree that looked similar to
the one they launched it from.

<snip>
> > The default of reading from stdin seems a bit unusual to me, and I
> > worry about having to explain that to users.  I'd rather the add
> > command took positional parameters (anything that doesn't start with a
> > hyphen) and added those, e.g.
> >   $ git sparse-checkout add '/myFolder/*' '
> > with the option of the user specifying --stdin.
>
> I had the same thought, and likely that's where we should go with the
> builtin. For our needs, the input over stdin is more important for
> testing, so I built it first. I will adjust the CLI here to take a set
> of paths over the arguments unless --stdin is given.
>
> >>  * git sparse-checkout disable removes the patterns from the sparse-checkout
> >>    file, disables core.sparseCheckout, and refills the working directory.
> >
> > Does it leave an empty sparse-checkout file around?  Also, what if
> > users have several paths defining their sparse pattern, and want to
> > temporarily get a full checkout and then come back -- do they need to
> > re-specify all the paths?  (Maybe this *is* the route we want to go;
> > I'm just trying to mention any possible negative effects we _might_
> > run into so we can consider them.  It's not quite as relevant in my
> > case since people specify a few toplevel modules and sparse-checkout
> > gets several entries auto-generated for them.)
>
> In this case, there is an intermediate step (that follows the existing
> advice) to modify the sparse-checkout file to contain only "/*\n" then
> run read-tree to fill the working directory before disabling the config
> setting.
>
> Perhaps "disable" is the wrong word to use, as it makes you think that
> there should be an "enable" that can quickly toggle between the two
> modes. Maybe instead it should be "git sparse-checkout reset [empty|full]"
> where you could 'reset' the sparse-checkout to one of two initial
> states:
>
> 1. empty: only files at root are included.
> 2. full: all files are included.
>
> In each case, we would obliterate the existing sparse-checkout entries,
> but hopefully that behavior is more clear from the command names.

Will "reset" be seen as slightly less obvious wording that needs to be
explained to users?  If so, maybe have "undo" and "empty" verbs?  (Of
course, "init" already empties, both when starting from full or when
we have some kind of sparse checkout.)  I dunno, just some ideas.

> >> In cone mode, a user specifies a list of folders which the user wants every
> >> file inside. In addition, the cone adds all blobs that are siblings of the
> >> folders in the directory path to that folder. This makes the directories
> >> look "hydrated" as a user drills down to those recursively-closed folders.
> >> These directories are called "parent" folders, as a file matches them only
> >> if the file's immediate parent is that directory.
> >>
> >> When building a prototype of this feature, I used a separate file to contain
> >> the list of recursively-closed folders and built the hashsets dynamically
> >> based on that file. In this implementation, I tried to maximize the amount
> >> of backwards-compatibility by storing all data in the sparse-checkout file
> >> using patterns recognized by earlier Git versions.
> >>
> >> For example, if we add A/B/C as a recursive folder, then we add the
> >> following patterns to the sparse-checkout file:
> >>
> >> /*
> >> !/*/*
> >> /A/*
> >> !/A/*/*
> >> /A/B/*
> >> !/A/B/*/*
> >> /A/B/C/*
> >>
> >> The alternating positive/negative patterns say "include everything in this
> >> folder, but exclude everything another level deeper". The final pattern has
> >> no matching negation, so is a recursively closed pattern.
> >
> > Oh, um, would there be any option for fast but without grabbing
> > sibling and parent files of requested directories?  And could users
> > still request individual files (not with regex or pathspec, but fully
> > specifying the path) and still get the fast mode?
>
> Exact files could probably be included and still be fast. It requires an
> extra hash check per entry, but that's a small price to pay I think.
>
> With the sibling files, this is something I believe to be user-friendly:
> as a user drills down into the folder they included recursively, there may
> be helpful files along the way, like documentation, project files, etc.
>
> Here is my philosophical position here: a repo can take advantage of the
> sparse-checkout feature if it is properly componetized. Those component
> boundaries are likely at folder boundaries. Any file that exists in a parent
> folder for two components is likely important to _both_ components. If
> a file is large and is not needed by both components, it should be placed
> deeper in the tree, so it can be avoided.
>
> With that philosophy in mind, I designed this to help users fall into the
> "pit of success" when their repo is in a good shape AND to motivate users
> with repos in non-optimal shapes to reorganize.
>
> The thought I had about exact file names is similar: if there is a large
> list of files in a folder where I only need a subset, then how do I know
> if a new file is added that I need? It will not show up in the directory
> without updating the sparse-checkout. A user would discover this need by
> something going wrong when they are not interacting with version control:
> a build.
>
> This is particularly important with the root directory. We need things
> like .gitignore, .gitattributes, README, LICENSE, etc. to be populated
> by default. If there are too many files at root to reasonably work with
> the repo, then the repo should be reorganized using folders.
>
> > Basically, our sparse usage is exclusively specifying leading
> > directories or full pathnames of individual files, but we really want
> > the repo to feel smaller and make sure people notice at a glance.  We
> > have a huge 'modules/' directory, and want people to be able to get
> > just 15 of the 500 or so subdirectories that would appear in that
> > directory with a non-sparse checkout.  And similarly we want to be
> > able to grab just one or two files from a directory of many files.
>
> Your modules/ example seems to work with the feature as designed, as
> you want a set of folders one level deeper. Grabbing one or two files
> from a directory is a direction we can go with the feature, but I will
> continue to believe that should be a rare occurrence compared to including
> a folder recursively.

Oh, you're right, I was misunderstanding what it'd do.  This does look
like it's really close to what we're using, and most of the
differences are probably worth some slightly reshuffling of paths in
the repo.  Now that I've played with it some, it seems really awesome.

Being able to grab one or two files from a directory without grabbing
an entire directory and its parents I think would probably still be
useful, but I do agree that it'd be a rare occurrence.

> >> Note that I have some basic warnings to try and check that the
> >> sparse-checkout file doesn't match what would be written by a cone-mode add.
> >> In such a case, Git writes a warning to stderr and continues with the old
> >> pattern matching algorithm. These checks are currently very barebones, and
> >> would need to be updated with more robust checks for things like regex
> >> characters in the middle of the pattern. As review moves forward (and if we
> >> don't change the data storage) then we could spend more time on this.
> >
> > Instead of trying to validate the sparse-checkout file everytime,
> > perhaps we want to change core.sparseCheckout from a boolean to a
> > tri-state or something where it specifies how to parse the
> > sparse-checkout file?  Or maybe when special directive (some form of
> > comment-looking line) appears at the top of sparse-checkout then we
> > use the hashsets speedup while disallowing general regexes and
> > pathspecs other than leading directories and full pathnames?
>
> In this series, I turn `core.sparseCheckout` into a tri-state, and only
> try to validate the sparse-checkout when `core.sparseCheckout=cone`.
> This avoids spending time on the validation when someone is content using
> the existing feature.
>
> The _intent_ of using the sparse-checkout file and no extra data structure
> was to let other clients (or an older client) read the sparse-checkout data
> and result in the same working directory. One thing I realized after
> submitting is that the tri-state config variable will cause old clients
> to error on parsing the non-boolean value. Instead, in v2 I will introduce
> a new boolean config variable "core.sparseCheckoutCone" that will do the
> same thing as the current series when `core.sparseCheckout=cone` and will
> fix this compat scenario.

Once we are forced to use yet another config variable, we may as well
use yet another config file ($GITDIR/info/sparse-checkout-cone or
something; or maybe a less specific name with greater future
compatibility via some version marking in it).

One thing I noticed twice while using this series was that when I had
an existing sparse checkout it was easy to get into a weird state
where things were messed up, I think due to the fact that
"sparse-checkout init [--cone]" prefers to honor any pre-existing
$GITDIR/info/sparse-checkout file.  Once my config file was very much
not cone-compatible, and another time it was empty and caused
read-tree to error out with something like "there'd be nothing left!".
I manually twiddled with core.sparseCheckout and the sparse-checkout
file and 'git read-tree -mu HEAD' to get it fixed, but I'd rather
avoid others running into such problems.  Sorry I didn't take good
notes on it; I was just trying to get a good feel for this series.

> > I'll try to get some time to look over these patches in the next few days.
>
> I look forward to your feedback! I also have some feedback to respond to
> from my team [1], but I'm waiting to make sure the community likes the
> overall idea before jumping into code style and method organization
> details.

I think this idea is great; I'm a big fan right now.  I'm excited to
see how this will pan out.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-24  5:40     ` Elijah Newren
@ 2019-08-26 13:29       ` Derrick Stolee
  2019-08-26 18:16         ` Elijah Newren
  2019-09-02 17:55       ` Eric Sunshine
  1 sibling, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2019-08-26 13:29 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano

On 8/24/2019 1:40 AM, Elijah Newren wrote:
> On Thu, Aug 22, 2019 at 6:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 8/21/2019 5:52 PM, Elijah Newren wrote:
>>> On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
> 
> <snip>
>>>> Here are some more specific details:
>>>>
>>>>  * git sparse-checkout init enables core.sparseCheckout and populates the
>>>>    sparse-checkout file with patterns that match only the files at root.
>>>
>>> Does it enable core.sparseCheckout in the current worktree, or for all
>>> worktrees?  Do we require extensions.worktreeConfig to be set to true
>>> first?  If we don't require extensions.worktreeConfig to be set to
>>> true, and users add worktrees later, do they encounter negative
>>> surprises (immediately or later)?
>>
>> This is an interesting scenario that I had not considered. Thanks!
>>
>> My guess is that we should set `extensions.worktreeConfig=true` to
>> avoid surprises. I'll need to play with this to discover the answers
>> to these questions:
>>
>> 1. Where does the worktree look for the sparse-checkout file? Does
>>    each worktree have its own sparse-checkout file? Should it?
> 
> For the main/first/primary worktree: .git/info/sparse-checkout
> For all other worktrees: .git/worktrees/$WORKTREE/info/sparse-checkout
> 
> So, yes, each has its own, and from my viewpoint, absolutely yes that
> is what we want.

Thanks for the info! I will definitely consider this in the next version,
and include a test to verify the interaction.

>> 2. If I have `extensions.worktreeConfig=true` and `core.sparseCheckout=true`
>>    in the current worktree and run `git worktree add`, does the new worktree
>>    have `core.sparseCheckout=true`? Can we `git clone --sparse` and then
>>    start building sparse worktrees seamlessly?
> 
> My $0.02: I think `git worktree add` should not only adopt the setting
> of core.sparseCheckout from the current worktree, but it should also
> adopt the $GIT_DIR/info/sparse-checkout file too.  Granted, users can
> change it to something else, but much like a new shell starts up with
> the same current working directory as its parent shell, I think it'd
> be most obvious for people to have a worktree that looked similar to
> the one they launched it from.

This seems natural to me: I'm adding a new worktree and expect the
settings to match my current worktree. If we later want to extend
`git worktree add` to include an `--empty-cone` option (that creates
the worktree as if it was created by `git clone --sparse-cone`) we
could do that independently.

> <snip>
>>> The default of reading from stdin seems a bit unusual to me, and I
>>> worry about having to explain that to users.  I'd rather the add
>>> command took positional parameters (anything that doesn't start with a
>>> hyphen) and added those, e.g.
>>>   $ git sparse-checkout add '/myFolder/*' '
>>> with the option of the user specifying --stdin.
>>
>> I had the same thought, and likely that's where we should go with the
>> builtin. For our needs, the input over stdin is more important for
>> testing, so I built it first. I will adjust the CLI here to take a set
>> of paths over the arguments unless --stdin is given.
>>
>>>>  * git sparse-checkout disable removes the patterns from the sparse-checkout
>>>>    file, disables core.sparseCheckout, and refills the working directory.
>>>
>>> Does it leave an empty sparse-checkout file around?  Also, what if
>>> users have several paths defining their sparse pattern, and want to
>>> temporarily get a full checkout and then come back -- do they need to
>>> re-specify all the paths?  (Maybe this *is* the route we want to go;
>>> I'm just trying to mention any possible negative effects we _might_
>>> run into so we can consider them.  It's not quite as relevant in my
>>> case since people specify a few toplevel modules and sparse-checkout
>>> gets several entries auto-generated for them.)
>>
>> In this case, there is an intermediate step (that follows the existing
>> advice) to modify the sparse-checkout file to contain only "/*\n" then
>> run read-tree to fill the working directory before disabling the config
>> setting.
>>
>> Perhaps "disable" is the wrong word to use, as it makes you think that
>> there should be an "enable" that can quickly toggle between the two
>> modes. Maybe instead it should be "git sparse-checkout reset [empty|full]"
>> where you could 'reset' the sparse-checkout to one of two initial
>> states:
>>
>> 1. empty: only files at root are included.
>> 2. full: all files are included.
>>
>> In each case, we would obliterate the existing sparse-checkout entries,
>> but hopefully that behavior is more clear from the command names.
> 
> Will "reset" be seen as slightly less obvious wording that needs to be
> explained to users?  If so, maybe have "undo" and "empty" verbs?  (Of
> course, "init" already empties, both when starting from full or when
> we have some kind of sparse checkout.)  I dunno, just some ideas.

Thanks for pointing out that my word choice could be improved. I'll
consider several options.

>>>> In cone mode, a user specifies a list of folders which the user wants every
>>>> file inside. In addition, the cone adds all blobs that are siblings of the
>>>> folders in the directory path to that folder. This makes the directories
>>>> look "hydrated" as a user drills down to those recursively-closed folders.
>>>> These directories are called "parent" folders, as a file matches them only
>>>> if the file's immediate parent is that directory.
>>>>
>>>> When building a prototype of this feature, I used a separate file to contain
>>>> the list of recursively-closed folders and built the hashsets dynamically
>>>> based on that file. In this implementation, I tried to maximize the amount
>>>> of backwards-compatibility by storing all data in the sparse-checkout file
>>>> using patterns recognized by earlier Git versions.
>>>>
>>>> For example, if we add A/B/C as a recursive folder, then we add the
>>>> following patterns to the sparse-checkout file:
>>>>
>>>> /*
>>>> !/*/*
>>>> /A/*
>>>> !/A/*/*
>>>> /A/B/*
>>>> !/A/B/*/*
>>>> /A/B/C/*
>>>>
>>>> The alternating positive/negative patterns say "include everything in this
>>>> folder, but exclude everything another level deeper". The final pattern has
>>>> no matching negation, so is a recursively closed pattern.
>>>
>>> Oh, um, would there be any option for fast but without grabbing
>>> sibling and parent files of requested directories?  And could users
>>> still request individual files (not with regex or pathspec, but fully
>>> specifying the path) and still get the fast mode?
>>
>> Exact files could probably be included and still be fast. It requires an
>> extra hash check per entry, but that's a small price to pay I think.
>>
>> With the sibling files, this is something I believe to be user-friendly:
>> as a user drills down into the folder they included recursively, there may
>> be helpful files along the way, like documentation, project files, etc.
>>
>> Here is my philosophical position here: a repo can take advantage of the
>> sparse-checkout feature if it is properly componetized. Those component
>> boundaries are likely at folder boundaries. Any file that exists in a parent
>> folder for two components is likely important to _both_ components. If
>> a file is large and is not needed by both components, it should be placed
>> deeper in the tree, so it can be avoided.
>>
>> With that philosophy in mind, I designed this to help users fall into the
>> "pit of success" when their repo is in a good shape AND to motivate users
>> with repos in non-optimal shapes to reorganize.
>>
>> The thought I had about exact file names is similar: if there is a large
>> list of files in a folder where I only need a subset, then how do I know
>> if a new file is added that I need? It will not show up in the directory
>> without updating the sparse-checkout. A user would discover this need by
>> something going wrong when they are not interacting with version control:
>> a build.
>>
>> This is particularly important with the root directory. We need things
>> like .gitignore, .gitattributes, README, LICENSE, etc. to be populated
>> by default. If there are too many files at root to reasonably work with
>> the repo, then the repo should be reorganized using folders.
>>
>>> Basically, our sparse usage is exclusively specifying leading
>>> directories or full pathnames of individual files, but we really want
>>> the repo to feel smaller and make sure people notice at a glance.  We
>>> have a huge 'modules/' directory, and want people to be able to get
>>> just 15 of the 500 or so subdirectories that would appear in that
>>> directory with a non-sparse checkout.  And similarly we want to be
>>> able to grab just one or two files from a directory of many files.
>>
>> Your modules/ example seems to work with the feature as designed, as
>> you want a set of folders one level deeper. Grabbing one or two files
>> from a directory is a direction we can go with the feature, but I will
>> continue to believe that should be a rare occurrence compared to including
>> a folder recursively.
> 
> Oh, you're right, I was misunderstanding what it'd do.  This does look
> like it's really close to what we're using, and most of the
> differences are probably worth some slightly reshuffling of paths in
> the repo.  Now that I've played with it some, it seems really awesome.
> 
> Being able to grab one or two files from a directory without grabbing
> an entire directory and its parents I think would probably still be
> useful, but I do agree that it'd be a rare occurrence.

I think we can leave the file-by-file addition for later, but may need
to make certain design decisions in this initial version to avoid issues
with adding that feature in the future. (Perhaps the recursive-folder input
should have "/" at the end, to clearly state these are folders, not files.)

>>>> Note that I have some basic warnings to try and check that the
>>>> sparse-checkout file doesn't match what would be written by a cone-mode add.
>>>> In such a case, Git writes a warning to stderr and continues with the old
>>>> pattern matching algorithm. These checks are currently very barebones, and
>>>> would need to be updated with more robust checks for things like regex
>>>> characters in the middle of the pattern. As review moves forward (and if we
>>>> don't change the data storage) then we could spend more time on this.
>>>
>>> Instead of trying to validate the sparse-checkout file everytime,
>>> perhaps we want to change core.sparseCheckout from a boolean to a
>>> tri-state or something where it specifies how to parse the
>>> sparse-checkout file?  Or maybe when special directive (some form of
>>> comment-looking line) appears at the top of sparse-checkout then we
>>> use the hashsets speedup while disallowing general regexes and
>>> pathspecs other than leading directories and full pathnames?
>>
>> In this series, I turn `core.sparseCheckout` into a tri-state, and only
>> try to validate the sparse-checkout when `core.sparseCheckout=cone`.
>> This avoids spending time on the validation when someone is content using
>> the existing feature.
>>
>> The _intent_ of using the sparse-checkout file and no extra data structure
>> was to let other clients (or an older client) read the sparse-checkout data
>> and result in the same working directory. One thing I realized after
>> submitting is that the tri-state config variable will cause old clients
>> to error on parsing the non-boolean value. Instead, in v2 I will introduce
>> a new boolean config variable "core.sparseCheckoutCone" that will do the
>> same thing as the current series when `core.sparseCheckout=cone` and will
>> fix this compat scenario.
> 
> Once we are forced to use yet another config variable, we may as well
> use yet another config file ($GITDIR/info/sparse-checkout-cone or
> something; or maybe a less specific name with greater future
> compatibility via some version marking in it).

I'm hesitant to include a second "source of truth," as that can cause
issues when users modify the sparse-checkout file directly. Since the
existing way to interact with the sparse-checkout is to directly edit
the file, I want to be as careful as possible around users who modify
that themselves. The caveat is that if they specify "cone" mode then
they will get warnings and worse performance if they modify it outside
the limited patterns we allow.

> One thing I noticed twice while using this series was that when I had
> an existing sparse checkout it was easy to get into a weird state
> where things were messed up, I think due to the fact that
> "sparse-checkout init [--cone]" prefers to honor any pre-existing
> $GITDIR/info/sparse-checkout file.  Once my config file was very much
> not cone-compatible, and another time it was empty and caused
> read-tree to error out with something like "there'd be nothing left!".
> I manually twiddled with core.sparseCheckout and the sparse-checkout
> file and 'git read-tree -mu HEAD' to get it fixed, but I'd rather
> avoid others running into such problems.  Sorry I didn't take good
> notes on it; I was just trying to get a good feel for this series.

Thanks for this interesting use case! I think an empty file should be
updated with the root files, since Git does not think that is a valid
state. The current series must only check for existence, not content.
 
>>> I'll try to get some time to look over these patches in the next few days.
>>
>> I look forward to your feedback! I also have some feedback to respond to
>> from my team [1], but I'm waiting to make sure the community likes the
>> overall idea before jumping into code style and method organization
>> details.
> 
> I think this idea is great; I'm a big fan right now.  I'm excited to
> see how this will pan out.

Thanks! I'll be taking a close look at your patch-by-patch feedback
this week and hope to have a non-RFC v2 soon.

-Stolee


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-26 13:29       ` Derrick Stolee
@ 2019-08-26 18:16         ` Elijah Newren
  2019-08-26 19:16           ` Derrick Stolee
  0 siblings, 1 reply; 29+ messages in thread
From: Elijah Newren @ 2019-08-26 18:16 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano

On Mon, Aug 26, 2019 at 6:29 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/24/2019 1:40 AM, Elijah Newren wrote:
> > On Thu, Aug 22, 2019 at 6:10 AM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 8/21/2019 5:52 PM, Elijah Newren wrote:
> >>> On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
> >>> <gitgitgadget@gmail.com> wrote:
> >

> >> In this series, I turn `core.sparseCheckout` into a tri-state, and only
> >> try to validate the sparse-checkout when `core.sparseCheckout=cone`.
> >> This avoids spending time on the validation when someone is content using
> >> the existing feature.
> >>
> >> The _intent_ of using the sparse-checkout file and no extra data structure
> >> was to let other clients (or an older client) read the sparse-checkout data
> >> and result in the same working directory. One thing I realized after
> >> submitting is that the tri-state config variable will cause old clients
> >> to error on parsing the non-boolean value. Instead, in v2 I will introduce
> >> a new boolean config variable "core.sparseCheckoutCone" that will do the
> >> same thing as the current series when `core.sparseCheckout=cone` and will
> >> fix this compat scenario.
> >
> > Once we are forced to use yet another config variable, we may as well
> > use yet another config file ($GITDIR/info/sparse-checkout-cone or
> > something; or maybe a less specific name with greater future
> > compatibility via some version marking in it).
>
> I'm hesitant to include a second "source of truth," as that can cause
> issues when users modify the sparse-checkout file directly. Since the
> existing way to interact with the sparse-checkout is to directly edit
> the file, I want to be as careful as possible around users who modify
> that themselves. The caveat is that if they specify "cone" mode then
> they will get warnings and worse performance if they modify it outside
> the limited patterns we allow.

Wait...does that mean you allow mixing and matching both regular style
sparse-checkout declarations with cone-mode style declarations within
the same file?  Are the non-cone mode entries ignored?  Does it
fallback to non-cone mode for all entries?  Or does that mean you
allow checking out both old and new styles of filesets, where you
optimize the cone-mode style declarations with your hashsets, and have
the remaining ones fall back to the old O(N*M) matching?  (I think it
does the last of those, right?)

If you support both, it sounds like you naturally support doing cone
mode with allowing people to also grab a handful of additional files
without the rest of their directories or parents.  It's just that
folks who want to do that will ask for a way to turn off any warnings
you spew, and if you turn the warnings off, then people who meant to
get cone behavior but mistyped stuff might complain about no warnings.
Hmm....

(Just trying to think things through out loud; I don't necessarily
know what's good or bad here, just voicing what I think might happen.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-26 18:16         ` Elijah Newren
@ 2019-08-26 19:16           ` Derrick Stolee
  0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2019-08-26 19:16 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano

On 8/26/2019 2:16 PM, Elijah Newren wrote:
> On Mon, Aug 26, 2019 at 6:29 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 8/24/2019 1:40 AM, Elijah Newren wrote:
>>> On Thu, Aug 22, 2019 at 6:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>>>>
>>>> On 8/21/2019 5:52 PM, Elijah Newren wrote:
>>>>> On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
>>>>> <gitgitgadget@gmail.com> wrote:
>>>
> 
>>>> In this series, I turn `core.sparseCheckout` into a tri-state, and only
>>>> try to validate the sparse-checkout when `core.sparseCheckout=cone`.
>>>> This avoids spending time on the validation when someone is content using
>>>> the existing feature.
>>>>
>>>> The _intent_ of using the sparse-checkout file and no extra data structure
>>>> was to let other clients (or an older client) read the sparse-checkout data
>>>> and result in the same working directory. One thing I realized after
>>>> submitting is that the tri-state config variable will cause old clients
>>>> to error on parsing the non-boolean value. Instead, in v2 I will introduce
>>>> a new boolean config variable "core.sparseCheckoutCone" that will do the
>>>> same thing as the current series when `core.sparseCheckout=cone` and will
>>>> fix this compat scenario.
>>>
>>> Once we are forced to use yet another config variable, we may as well
>>> use yet another config file ($GITDIR/info/sparse-checkout-cone or
>>> something; or maybe a less specific name with greater future
>>> compatibility via some version marking in it).
>>
>> I'm hesitant to include a second "source of truth," as that can cause
>> issues when users modify the sparse-checkout file directly. Since the
>> existing way to interact with the sparse-checkout is to directly edit
>> the file, I want to be as careful as possible around users who modify
>> that themselves. The caveat is that if they specify "cone" mode then
>> they will get warnings and worse performance if they modify it outside
>> the limited patterns we allow.
> 
> Wait...does that mean you allow mixing and matching both regular style
> sparse-checkout declarations with cone-mode style declarations within
> the same file?  Are the non-cone mode entries ignored?  Does it
> fallback to non-cone mode for all entries?  Or does that mean you
> allow checking out both old and new styles of filesets, where you
> optimize the cone-mode style declarations with your hashsets, and have
> the remaining ones fall back to the old O(N*M) matching?  (I think it
> does the last of those, right?)
> 
> If you support both, it sounds like you naturally support doing cone
> mode with allowing people to also grab a handful of additional files
> without the rest of their directories or parents.  It's just that
> folks who want to do that will ask for a way to turn off any warnings
> you spew, and if you turn the warnings off, then people who meant to
> get cone behavior but mistyped stuff might complain about no warnings.
> Hmm....
> 
> (Just trying to think things through out loud; I don't necessarily
> know what's good or bad here, just voicing what I think might happen.)

The way I built the current series is that we honor what is in the
sparse-checkout as historically allowed. Always.

If a user modifies the sparse-checkout to have patterns that don't match
those that are added in cone mode, then Git warns the user this is the
case then reverts to the old pattern-by-pattern logic. This is to have
the Git client always match what another Git client would expect. (This
could be JGit or an older version of Git.) A user could always disable
cone mode to remove the warning and keep their sparse-checkout in its
current state.

Note: I have not made the "non-code-mode pattern" checks very robust.
For instance, I don't check the middle characters for wildcards. This
needs to happen at write time, too. The plan is to make these more
robust in future versions of the series.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode
  2019-08-24  5:40     ` Elijah Newren
  2019-08-26 13:29       ` Derrick Stolee
@ 2019-09-02 17:55       ` Eric Sunshine
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Sunshine @ 2019-09-02 17:55 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Junio C Hamano

On Sat, Aug 24, 2019 at 1:40 AM Elijah Newren <newren@gmail.com> wrote:
> My $0.02: I think `git worktree add` should not only adopt the setting
> of core.sparseCheckout from the current worktree, but it should also
> adopt the $GIT_DIR/info/sparse-checkout file too.

As another example in favor of imbuing "git worktree add" with
first-class support for this feature (via command-line option and/or
inheriting existing settings), the commit message of ef2a0ac9a0
(worktree: add: introduce --checkout option, 2016-03-29) specifically
sites sparse checkout as the motivation for adding --no-checkout to
"git worktree add".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/9] sparse-checkout: create 'init' subcommand
  2019-08-23 23:02   ` Elijah Newren
@ 2019-09-11 14:27     ` Derrick Stolee
  2019-09-11 20:28     ` Derrick Stolee
  1 sibling, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2019-09-11 14:27 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On 8/23/2019 7:02 PM, Elijah Newren wrote:
> On Tue, Aug 20, 2019 at 8:13 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Getting started with a sparse-checkout file can be daunting. Help
>> users start their sparse enlistment using 'git sparse-checkout init'.
>> This will set 'core.sparseCheckout=true' in their config, write
>> an initial set of patterns to the sparse-checkout file, and update
>> their working directory.
>>
>> Using 'git read-tree' to clear directories does not work cleanly
>> on Windows, so manually delete directories that are tracked by Git
>> before running read-tree.
> 
> Is that a bug in read-tree that needs to be fixed?

Just to follow up on this: it turns out that this is NOT a bug in
read-tree, but rather a side-effect of our custom "core.gvfs" config
setting. In the virtualized world, we didn't want Git to hard-delete
a folder just because we marked everything sparse.

By removing that option from my environment, the deletions work as
expected.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/9] sparse-checkout: create 'init' subcommand
  2019-08-23 23:02   ` Elijah Newren
  2019-09-11 14:27     ` Derrick Stolee
@ 2019-09-11 20:28     ` Derrick Stolee
  1 sibling, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2019-09-11 20:28 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On 8/23/2019 7:02 PM, Elijah Newren wrote:
> On Tue, Aug 20, 2019 at 8:13 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> +static int sc_read_tree(void)
>> +{
>> +       struct argv_array argv = ARGV_ARRAY_INIT;
>> +       int result = 0;
>> +       argv_array_pushl(&argv, "read-tree", "-m", "-u", "HEAD", NULL);
>> +
>> +       if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
>> +               error(_("failed to update index with new sparse-checkout paths"));
>> +               result = 1;
>> +       }
> 
> `git read-tree -m -u HEAD` will fail if the index has any higher stage
> entries in it, even if those higher stage entries correspond to files
> which are included in the sparseness patterns and thus would not need
> an update.  It might be nice if we can find a way to provide a better
> error message, and/or implement the read-tree -m -u HEAD internally in
> a way that will allow us to not fail if the conflicted files are
> included in the sparse set.

I agree that this is not the _best_ thing to do, but it does mimic the
current recommendation for a user interacting with sparse-checkout.

I'll rename this helper to something like "update_working_directory()"
so it can be swapped with a different implementation later, after we
work out those usability kinks.

The other thing that is needed here: allow reverting the sparse-checkout
settings if this fails. I'll isolate that to a new commit so we can
examine that behavior carefully.

> 
>> +
>> +       argv_array_clear(&argv);
>> +       return result;
>> +}
>> +
>> +static int sc_enable_config(void)
>> +{
>> +       struct argv_array argv = ARGV_ARRAY_INIT;
>> +       int result = 0;
>> +       argv_array_pushl(&argv, "config", "--add", "core.sparseCheckout", "true", NULL);
> > Why --add?  That seems really odd to me.

Yeah, that's a mistake. Good find.

> 
> This should also have "--worktree".  And this function should either
> set extensions.worktreeConfig to true or die if it isn't already set;
> not sure which.  There's some UI and documentation stuff to figure out
> here...

I was planning to switch my `git config` subcommand to use in-process
methods, but I'm struggling to find a way to ensure we follow the
`--worktree` option. It likely would work if extensions.worktreeConfig
was enabled when the process starts, but adding it in-process likely
causes a problem.

> 
>> +
>> +       if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
>> +               error(_("failed to enable core.sparseCheckout"));
>> +               result = 1;
>> +       }
>> +
>> +       argv_array_clear(&argv);
>> +       return result;
>> +}
>> +
>> +static int delete_directory(const struct object_id *oid, struct strbuf *base,
>> +               const char *pathname, unsigned mode, int stage, void *context)
>> +{
>> +       struct strbuf dirname = STRBUF_INIT;
>> +       struct stat sb;
>> +
>> +       strbuf_addstr(&dirname, the_repository->worktree);
>> +       strbuf_addch(&dirname, '/');
>> +       strbuf_addstr(&dirname, pathname);
>> +
>> +       if (stat(dirname.buf, &sb) || !(sb.st_mode & S_IFDIR))
>> +               return 0;
>> +
>> +       if (remove_dir_recursively(&dirname, 0))
> 
> flags = 0 implies not REMOVE_DIR_EMPTY_ONLY.  I'm not familiar with
> remove_dir_recursively(), but won't this delete everything...including
> untracked files?  If so, that sounds like a bug.
This whole thing isn't needed any more, since read-tree does the right
thing.

> 
>> +               warning(_("failed to remove directory '%s'"),
>> +                       dirname.buf);
>> +
>> +       strbuf_release(&dirname);
>> +       return 0;
>> +}
>> +
>> +static int sparse_checkout_init(int argc, const char **argv)
>> +{
>> +       struct tree *t;
>> +       struct object_id oid;
>> +       struct exclude_list el;
>> +       static struct pathspec pathspec;
>> +       char *sparse_filename;
>> +       FILE *fp;
>> +       int res;
>> +
>> +       if (sc_enable_config())
>> +               return 1;
>> +
>> +       memset(&el, 0, sizeof(el));
>> +
>> +       sparse_filename = get_sparse_checkout_filename();
>> +       res = add_excludes_from_file_to_list(sparse_filename, "", 0, &el, NULL);
> 
> But 'el' isn't used again?  Why are we getting the list of files from
> sparse_filename then?

This is the only way I could think to check that the sparse-checkout file parses well without just doing the file open myself. Maybe we only need to check if the file exists (and is not empty).

>> +
>> +       /* If we already have a sparse-checkout file, use it. */
>> +       if (res >= 0) {
>> +               free(sparse_filename);
>> +               goto reset_dir;
>> +       }
>> +
>> +       /* initial mode: all blobs at root */
>> +       fp = fopen(sparse_filename, "w");
>> +       free(sparse_filename);
>> +       fprintf(fp, "/*\n!/*/*\n");
>> +       fclose(fp);
> 
> Makes sense.
> 
>> +
>> +       /* remove all directories in the root, if tracked by Git */
>> +       if (get_oid("HEAD", &oid)) {
>> +               /* assume we are in a fresh repo */
>> +               return 0;
>> +       }
>> +
>> +       t = parse_tree_indirect(&oid);
>> +
>> +       parse_pathspec(&pathspec, PATHSPEC_ALL_MAGIC &
>> +                                 ~(PATHSPEC_FROMTOP | PATHSPEC_LITERAL),
>> +                      PATHSPEC_PREFER_CWD,
>> +                      "", NULL);
>> +
>> +       if (read_tree_recursive(the_repository, t, "", 0, 0, &pathspec,
>> +                               delete_directory, NULL))
>> +               return 1;
> 
> Since this is only needed on Windows, as per your commit message,
> should it be #ifdef'd?  Or is this actually a bug that should be fixed
> in "git read-tree -mu HEAD"?

(this will not be needed, but thanks!)
 
>> +
>> +reset_dir:
>> +       return sc_read_tree();
>> +}
>> +
> 
> The rest looks fine.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, back to index

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 15:11 [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Derrick Stolee via GitGitGadget
2019-08-20 15:11 ` [PATCH 1/9] sparse-checkout: create builtin with 'list' subcommand Derrick Stolee via GitGitGadget
2019-08-23 22:30   ` Elijah Newren
2019-08-20 15:11 ` [PATCH 2/9] sparse-checkout: create 'init' subcommand Derrick Stolee via GitGitGadget
2019-08-23 23:02   ` Elijah Newren
2019-09-11 14:27     ` Derrick Stolee
2019-09-11 20:28     ` Derrick Stolee
2019-08-20 15:11 ` [PATCH 3/9] clone: add --sparse mode Derrick Stolee via GitGitGadget
2019-08-23 23:17   ` Elijah Newren
2019-08-20 15:11 ` [PATCH 4/9] sparse-checkout: 'add' subcommand Derrick Stolee via GitGitGadget
2019-08-23 23:30   ` Elijah Newren
2019-08-20 15:11 ` [PATCH 6/9] trace2:experiment: clear_ce_flags_1 Jeff Hostetler via GitGitGadget
2019-08-24  0:08   ` Elijah Newren
2019-08-20 15:11 ` [PATCH 5/9] sparse-checkout: create 'disable' subcommand Derrick Stolee via GitGitGadget
2019-08-23 23:50   ` Elijah Newren
2019-08-20 15:11 ` [PATCH 7/9] sparse-checkout: add 'cone' mode Derrick Stolee via GitGitGadget
2019-08-24  0:31   ` Elijah Newren
2019-08-20 15:11 ` [PATCH 8/9] sparse-checkout: use hashmaps for cone patterns Derrick Stolee via GitGitGadget
2019-08-24  4:56   ` Elijah Newren
2019-08-20 15:11 ` [PATCH 9/9] sparse-checkout: init and add in cone mode Derrick Stolee via GitGitGadget
2019-08-24  5:07   ` Elijah Newren
2019-08-21 21:52 ` [PATCH 0/9] [RFC] New sparse-checkout builtin and "cone" mode Elijah Newren
2019-08-22 13:10   ` Derrick Stolee
2019-08-22 14:25     ` Derrick Stolee
2019-08-24  5:40     ` Elijah Newren
2019-08-26 13:29       ` Derrick Stolee
2019-08-26 18:16         ` Elijah Newren
2019-08-26 19:16           ` Derrick Stolee
2019-09-02 17:55       ` Eric Sunshine

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox