git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] Parallel Checkout (part 3)
@ 2021-04-22 15:17 Matheus Tavares
  2021-04-22 15:17 ` [PATCH 1/7] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
                   ` (7 more replies)
  0 siblings, 8 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git

This is the last part of the parallel checkout series :) I have a couple
other small optimization and tuning ideas for later but, with this part,
the main functionality should be now complete.

This is based on mt/parallel-checkout-part-2.

The first three patches add parallel checkout support for three
codepaths that could not use the parallel mode yet:

- git checkout-index
- git checkout --patch
- git checkout <pathspec>

All these three call `checkout_entry()` directly instead of
`unpack_trees()`, so we need to enable parallel checkout on them before
they call `checkout_entry()`.

The four remaining patches add tests for the parallel checkout
framework.

Matheus Tavares (7):
  make_transient_cache_entry(): optionally alloc from mem_pool
  builtin/checkout.c: complete parallel checkout support
  checkout-index: add parallel checkout support
  parallel-checkout: add tests for basic operations
  parallel-checkout: add tests related to path collisions
  parallel-checkout: add tests related to .gitattributes
  ci: run test round with parallel-checkout enabled

 builtin/checkout--worker.c              |   2 +-
 builtin/checkout-index.c                |  24 +--
 builtin/checkout.c                      |  20 ++-
 builtin/difftool.c                      |   2 +-
 cache.h                                 |  11 +-
 ci/run-build-and-tests.sh               |   1 +
 parallel-checkout.c                     |  18 +++
 read-cache.c                            |  12 +-
 t/README                                |   4 +
 t/lib-encoding.sh                       |  25 +++
 t/lib-parallel-checkout.sh              |  40 +++++
 t/t0028-working-tree-encoding.sh        |  25 +--
 t/t2080-parallel-checkout-basics.sh     | 150 ++++++++++++++++++
 t/t2081-parallel-checkout-collisions.sh | 162 ++++++++++++++++++++
 t/t2082-parallel-checkout-attributes.sh | 194 ++++++++++++++++++++++++
 unpack-trees.c                          |   2 +-
 16 files changed, 643 insertions(+), 49 deletions(-)
 create mode 100644 t/lib-encoding.sh
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh
 create mode 100755 t/t2081-parallel-checkout-collisions.sh
 create mode 100755 t/t2082-parallel-checkout-attributes.sh

-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 1/7] make_transient_cache_entry(): optionally alloc from mem_pool
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
@ 2021-04-22 15:17 ` Matheus Tavares
  2021-04-22 15:17 ` [PATCH 2/7] builtin/checkout.c: complete parallel checkout support Matheus Tavares
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git

Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout--worker.c |  2 +-
 builtin/checkout.c         |  2 +-
 builtin/difftool.c         |  2 +-
 cache.h                    | 11 ++++++-----
 read-cache.c               | 12 ++++++++----
 unpack-trees.c             |  2 +-
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/checkout--worker.c b/builtin/checkout--worker.c
index 31e0de2f7e..289a9b8f89 100644
--- a/builtin/checkout--worker.c
+++ b/builtin/checkout--worker.c
@@ -39,7 +39,7 @@ static void packet_to_pc_item(const char *buffer, int len,
 	}
 
 	memset(pc_item, 0, sizeof(*pc_item));
-	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len, NULL);
 	pc_item->ce->ce_namelen = fixed_portion->name_len;
 	pc_item->ce->ce_mode = fixed_portion->ce_mode;
 	memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 4c696ef480..db667d0267 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -291,7 +291,7 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
 	if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
 		die(_("Unable to add merge result for '%s'"), path);
 	free(result_buf.ptr);
-	ce = make_transient_cache_entry(mode, &oid, path, 2);
+	ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
 	if (!ce)
 		die(_("make_cache_entry failed for path '%s'"), path);
 	status = checkout_entry(ce, state, NULL, nr_checkouts);
diff --git a/builtin/difftool.c b/builtin/difftool.c
index ef25729d49..afacbcd581 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -323,7 +323,7 @@ static int checkout_path(unsigned mode, struct object_id *oid,
 	struct cache_entry *ce;
 	int ret;
 
-	ce = make_transient_cache_entry(mode, oid, path, 0);
+	ce = make_transient_cache_entry(mode, oid, path, 0, NULL);
 	ret = checkout_entry(ce, state, NULL, NULL);
 
 	discard_cache_entry(ce);
diff --git a/cache.h b/cache.h
index 148d9ab5f1..b6b42cc3f3 100644
--- a/cache.h
+++ b/cache.h
@@ -356,16 +356,17 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate,
 					   size_t name_len);
 
 /*
- * Create a cache_entry that is not intended to be added to an index.
- * Caller is responsible for discarding the cache_entry
- * with `discard_cache_entry`.
+ * Create a cache_entry that is not intended to be added to an index. If `mp`
+ * is not NULL, the entry is allocated within the given memory pool. Caller is
+ * responsible for discarding "loose" entries with `discard_cache_entry()` and
+ * the mem_pool with `mem_pool_discard(mp, should_validate_cache_entries())`.
  */
 struct cache_entry *make_transient_cache_entry(unsigned int mode,
 					       const struct object_id *oid,
 					       const char *path,
-					       int stage);
+					       int stage, struct mem_pool *mp);
 
-struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp);
 
 /*
  * Discard cache entry.
diff --git a/read-cache.c b/read-cache.c
index 5a907af2fb..eb389ccb8f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -813,8 +813,10 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate, size_t le
 	return mem_pool__ce_calloc(find_mem_pool(istate), len);
 }
 
-struct cache_entry *make_empty_transient_cache_entry(size_t len)
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp)
 {
+	if (mp)
+		return mem_pool__ce_calloc(mp, len);
 	return xcalloc(1, cache_entry_size(len));
 }
 
@@ -848,8 +850,10 @@ struct cache_entry *make_cache_entry(struct index_state *istate,
 	return ret;
 }
 
-struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid,
-					       const char *path, int stage)
+struct cache_entry *make_transient_cache_entry(unsigned int mode,
+					       const struct object_id *oid,
+					       const char *path, int stage,
+					       struct mem_pool *mp)
 {
 	struct cache_entry *ce;
 	int len;
@@ -860,7 +864,7 @@ struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct o
 	}
 
 	len = strlen(path);
-	ce = make_empty_transient_cache_entry(len);
+	ce = make_empty_transient_cache_entry(len, mp);
 
 	oidcpy(&ce->oid, oid);
 	memcpy(ce->name, path, len);
diff --git a/unpack-trees.c b/unpack-trees.c
index 4b77e52c6b..fa5b7ab7ee 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1034,7 +1034,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	size_t len = traverse_path_len(info, tree_entry_len(n));
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len) :
+		make_empty_transient_cache_entry(len, NULL) :
 		make_empty_cache_entry(istate, len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 2/7] builtin/checkout.c: complete parallel checkout support
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
  2021-04-22 15:17 ` [PATCH 1/7] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
@ 2021-04-22 15:17 ` Matheus Tavares
  2021-04-23 16:19   ` Derrick Stolee
  2021-04-22 15:17 ` [PATCH 3/7] checkout-index: add " Matheus Tavares
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git

There is one code path in builtin/checkout.c which still doesn't benefit
from parallel checkout because it calls checkout_entry() directly,
instead of unpack_trees(). Let's add parallel checkout support for this
missing spot as well. Note: the transient cache entries allocated in
checkout_merged() are now allocated in a mem_pool which is only
discarded after parallel checkout finishes. This is done because the
entries need to be valid when run_parallel_checkout() is called.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index db667d0267..b71dc08430 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -27,6 +27,7 @@
 #include "wt-status.h"
 #include "xdiff-interface.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 static const char * const checkout_usage[] = {
 	N_("git checkout [<options>] <branch>"),
@@ -230,7 +231,8 @@ static int checkout_stage(int stage, const struct cache_entry *ce, int pos,
 		return error(_("path '%s' does not have their version"), ce->name);
 }
 
-static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts)
+static int checkout_merged(int pos, const struct checkout *state,
+			   int *nr_checkouts, struct mem_pool *ce_mem_pool)
 {
 	struct cache_entry *ce = active_cache[pos];
 	const char *path = ce->name;
@@ -291,11 +293,10 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
 	if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
 		die(_("Unable to add merge result for '%s'"), path);
 	free(result_buf.ptr);
-	ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
+	ce = make_transient_cache_entry(mode, &oid, path, 2, ce_mem_pool);
 	if (!ce)
 		die(_("make_cache_entry failed for path '%s'"), path);
 	status = checkout_entry(ce, state, NULL, nr_checkouts);
-	discard_cache_entry(ce);
 	return status;
 }
 
@@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
 	int nr_checkouts = 0, nr_unmerged = 0;
 	int errs = 0;
 	int pos;
+	int pc_workers, pc_threshold;
+	struct mem_pool ce_mem_pool;
 
 	state.force = 1;
 	state.refresh_cache = 1;
 	state.istate = &the_index;
 
+	mem_pool_init(&ce_mem_pool, 0);
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
 	init_checkout_metadata(&state.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : &info->oid,
 			       NULL);
 
 	enable_delayed_checkout(&state);
+	if (pc_workers > 1)
+		init_parallel_checkout();
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
@@ -384,10 +391,15 @@ static int checkout_worktree(const struct checkout_opts *opts,
 						       &nr_checkouts, opts->overlay_mode);
 			else if (opts->merge)
 				errs |= checkout_merged(pos, &state,
-							&nr_unmerged);
+							&nr_unmerged,
+							&ce_mem_pool);
 			pos = skip_same_name(ce, pos) - 1;
 		}
 	}
+	if (pc_workers > 1)
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					      NULL, NULL);
+	mem_pool_discard(&ce_mem_pool, should_validate_cache_entries());
 	remove_marked_cache_entries(&the_index, 1);
 	remove_scheduled_dirs();
 	errs |= finish_delayed_checkout(&state, &nr_checkouts);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 3/7] checkout-index: add parallel checkout support
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
  2021-04-22 15:17 ` [PATCH 1/7] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
  2021-04-22 15:17 ` [PATCH 2/7] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2021-04-22 15:17 ` Matheus Tavares
  2021-04-23 18:32   ` Derrick Stolee
  2021-04-22 15:17 ` [PATCH 4/7] parallel-checkout: add tests for basic operations Matheus Tavares
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git

Note: previously, `checkout_all()` would not return on errors, but
instead call `exit()` with a non-zero code. However, it only did that
after calling `checkout_entry()` for all index entries, thus not
stopping on the first error, but attempting to write the maximum number
of entries possible. In order to maintain this behavior we now propagate
`checkout_all()`s error status to `cmd_checkout_index()`, so that it can
call `run_parallel_checkout()` and attempt to write the queued entries
before exiting with the error code.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout-index.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index c0bf4ac1b2..e8a82ea9ed 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -12,6 +12,7 @@
 #include "cache-tree.h"
 #include "parse-options.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 #define CHECKOUT_ALL 4
 static int nul_term_line;
@@ -115,7 +116,7 @@ static int checkout_file(const char *name, const char *prefix)
 	return -1;
 }
 
-static void checkout_all(const char *prefix, int prefix_length)
+static int checkout_all(const char *prefix, int prefix_length)
 {
 	int i, errs = 0;
 	struct cache_entry *last_ce = NULL;
@@ -142,11 +143,7 @@ static void checkout_all(const char *prefix, int prefix_length)
 	}
 	if (last_ce && to_tempfile)
 		write_tempfile_record(last_ce->name, prefix);
-	if (errs)
-		/* we have already done our error reporting.
-		 * exit with the same code as die().
-		 */
-		exit(128);
+	return !!errs;
 }
 
 static const char * const builtin_checkout_index_usage[] = {
@@ -182,6 +179,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 	int force = 0, quiet = 0, not_new = 0;
 	int index_opt = 0;
 	int err = 0;
+	int pc_workers, pc_threshold;
 	struct option builtin_checkout_index_options[] = {
 		OPT_BOOL('a', "all", &all,
 			N_("check out all files in the index")),
@@ -236,6 +234,10 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 	}
 
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+	if (pc_workers > 1)
+		init_parallel_checkout();
+
 	/* Check out named files first */
 	for (i = 0; i < argc; i++) {
 		const char *arg = argv[i];
@@ -275,12 +277,16 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	if (all)
+		err |= checkout_all(prefix, prefix_length);
+
+	if (pc_workers > 1)
+		err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					     NULL, NULL);
+
 	if (err)
 		return 1;
 
-	if (all)
-		checkout_all(prefix, prefix_length);
-
 	if (is_lock_file_locked(&lock_file) &&
 	    write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die("Unable to write new index file");
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 4/7] parallel-checkout: add tests for basic operations
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
                   ` (2 preceding siblings ...)
  2021-04-22 15:17 ` [PATCH 3/7] checkout-index: add " Matheus Tavares
@ 2021-04-22 15:17 ` Matheus Tavares
  2021-04-23 19:18   ` Derrick Stolee
  2021-04-22 15:17 ` [PATCH 5/7] parallel-checkout: add tests related to path collisions Matheus Tavares
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, Jonathan Nieder

Add tests to populate the working tree during clone and checkout using
sequential and parallel mode, to confirm that they produce identical
results. Also test basic checkout mechanics, such as checking for
symlinks in the leading directories and the abidance to --force.

Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But they will also be used by other
parallel-checkout tests in the following patches.

Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-parallel-checkout.sh          |  37 +++++++
 t/t2080-parallel-checkout-basics.sh | 150 ++++++++++++++++++++++++++++
 2 files changed, 187 insertions(+)
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh

diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
new file mode 100644
index 0000000000..39fd36fdf6
--- /dev/null
+++ b/t/lib-parallel-checkout.sh
@@ -0,0 +1,37 @@
+# Helpers for t208* tests
+
+set_checkout_config () {
+	if test $# -ne 2
+	then
+		BUG "set_checkout_config() requires two arguments"
+	fi &&
+
+	test_config_global checkout.workers $1 &&
+	test_config_global checkout.thresholdForParallelism $2
+}
+
+# Run "${@:2}" and check that $1 checkout workers were used
+test_checkout_workers () {
+	if test $# -lt 2
+	then
+		BUG "too few arguments to test_checkout_workers()"
+	fi &&
+
+	expected_workers=$1 &&
+	shift &&
+
+	rm -f trace &&
+	GIT_TRACE2="$(pwd)/trace" "$@" &&
+
+	workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
+	test $workers -eq $expected_workers &&
+	rm -f trace
+}
+
+# Verify that both the working tree and the index were created correctly
+verify_checkout () {
+	git -C "$1" diff-index --quiet HEAD -- &&
+	git -C "$1" diff-index --quiet --cached HEAD -- &&
+	git -C "$1" status --porcelain >"$1".status &&
+	test_must_be_empty "$1".status
+}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
new file mode 100755
index 0000000000..0cb1493cdc
--- /dev/null
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -0,0 +1,150 @@
+#!/bin/sh
+
+test_description='parallel-checkout basics
+
+Ensure that parallel-checkout basically works on clone and checkout, spawning
+the required number of workers and correctly populating both the index and the
+working tree.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+# Test parallel-checkout with a branch switch containing file creations,
+# deletions, and modification; with different entry types. Switching from B1 to
+# B2 will have the following changes:
+#
+# - a (file):      modified
+# - e/x (file):    deleted
+# - b (symlink):   deleted
+# - b/f (file):    created
+# - e (symlink):   created
+# - d (submodule): created
+#
+test_expect_success SYMLINKS 'setup repo for checkout with various types of changes' '
+	git init various &&
+	(
+		cd various &&
+		git checkout -b B1 &&
+		echo a >a &&
+		mkdir e &&
+		echo e/x >e/x &&
+		ln -s e b &&
+		git add -A &&
+		git commit -m B1 &&
+
+		git checkout -b B2 &&
+		echo modified >a &&
+		rm -rf e &&
+		rm b &&
+		mkdir b &&
+		echo b/f >b/f &&
+		ln -s b e &&
+		git init d &&
+		test_commit -C d f &&
+		git submodule add ./d &&
+		git add -A &&
+		git commit -m B2 &&
+
+		git checkout --recurse-submodules B1
+	)
+'
+
+test_expect_success SYMLINKS 'sequential checkout' '
+	cp -R various various_sequential &&
+	set_checkout_config 1 0 &&
+	test_checkout_workers 0 \
+		git -C various_sequential checkout --recurse-submodules B2 &&
+	verify_checkout various_sequential
+'
+
+test_expect_success SYMLINKS 'parallel checkout' '
+	cp -R various various_parallel &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 \
+		git -C various_parallel checkout --recurse-submodules B2 &&
+	verify_checkout various_parallel
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout (threshold)' '
+	cp -R various various_sequential_fallback &&
+	set_checkout_config 2 100 &&
+	test_checkout_workers 0 \
+		git -C various_sequential_fallback checkout --recurse-submodules B2 &&
+	verify_checkout various_sequential_fallback
+'
+
+test_expect_success SYMLINKS 'parallel checkout on clone' '
+	git -C various checkout --recurse-submodules B2 &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 \
+		git clone --recurse-submodules various various_parallel_clone &&
+	verify_checkout various_parallel_clone
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout on clone (threshold)' '
+	git -C various checkout --recurse-submodules B2 &&
+	set_checkout_config 2 100 &&
+	test_checkout_workers 0 \
+		git clone --recurse-submodules various various_sequential_fallback_clone &&
+	verify_checkout various_sequential_fallback_clone
+'
+
+# Just to be paranoid, actually compare the working trees' contents directly.
+test_expect_success SYMLINKS 'compare the working trees' '
+	rm -rf various_*/.git &&
+	rm -rf various_*/d/.git &&
+
+	diff -r various_sequential various_parallel &&
+	diff -r various_sequential various_sequential_fallback &&
+	diff -r various_sequential various_parallel_clone &&
+	diff -r various_sequential various_sequential_fallback_clone
+'
+
+test_expect_success 'parallel checkout respects --[no]-force' '
+	set_checkout_config 2 0 &&
+	git init dirty &&
+	(
+		cd dirty &&
+		mkdir D &&
+		test_commit D/F &&
+		test_commit F &&
+
+		rm -rf D &&
+		echo changed >D &&
+		echo changed >F.t &&
+
+		# We expect 0 workers because there is nothing to be done
+		test_checkout_workers 0 git checkout HEAD &&
+		test_path_is_file D &&
+		grep changed D &&
+		grep changed F.t &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		test_path_is_dir D &&
+		grep D/F D/F.t &&
+		grep F F.t
+	)
+'
+
+test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
+	set_checkout_config 2 0 &&
+	git init symlinks &&
+	(
+		cd symlinks &&
+		mkdir D untracked &&
+		# Commit 2 files to have enough work for 2 parallel workers
+		test_commit D/A &&
+		test_commit D/B &&
+		rm -rf D &&
+		ln -s untracked D &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		! test -h D &&
+		grep D/A D/A.t &&
+		grep D/B D/B.t
+	)
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 5/7] parallel-checkout: add tests related to path collisions
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
                   ` (3 preceding siblings ...)
  2021-04-22 15:17 ` [PATCH 4/7] parallel-checkout: add tests for basic operations Matheus Tavares
@ 2021-04-22 15:17 ` Matheus Tavares
  2021-04-22 15:17 ` [PATCH 6/7] parallel-checkout: add tests related to .gitattributes Matheus Tavares
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git

Add tests to confirm that path collisions are properly detected by
checkout workers, both to avoid race conditions and to report colliding
entries on clone.

Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 parallel-checkout.c                     |   4 +
 t/lib-parallel-checkout.sh              |   4 +-
 t/t2081-parallel-checkout-collisions.sh | 162 ++++++++++++++++++++++++
 3 files changed, 168 insertions(+), 2 deletions(-)
 create mode 100755 t/t2081-parallel-checkout-collisions.sh

diff --git a/parallel-checkout.c b/parallel-checkout.c
index 09e8b10a35..6fb3f1e6c9 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -8,6 +8,7 @@
 #include "sigchain.h"
 #include "streaming.h"
 #include "thread-utils.h"
+#include "trace2.h"
 
 struct pc_worker {
 	struct child_process cp;
@@ -326,6 +327,7 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
 	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
 					   state->base_dir_len)) {
 		pc_item->status = PC_ITEM_COLLIDED;
+		trace2_data_string("pcheckout", NULL, "collision/dirname", path.buf);
 		goto out;
 	}
 
@@ -341,6 +343,8 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
 			 * call should have already caught these cases.
 			 */
 			pc_item->status = PC_ITEM_COLLIDED;
+			trace2_data_string("pcheckout", NULL,
+					   "collision/basename", path.buf);
 		} else {
 			error_errno("failed to open file '%s'", path.buf);
 			pc_item->status = PC_ITEM_FAILED;
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index 39fd36fdf6..16ee18389b 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -21,12 +21,12 @@ test_checkout_workers () {
 	shift &&
 
 	rm -f trace &&
-	GIT_TRACE2="$(pwd)/trace" "$@" &&
+	GIT_TRACE2="$(pwd)/trace" "$@" 2>&8 &&
 
 	workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
 	test $workers -eq $expected_workers &&
 	rm -f trace
-}
+} 8>&2 2>&4
 
 # Verify that both the working tree and the index were created correctly
 verify_checkout () {
diff --git a/t/t2081-parallel-checkout-collisions.sh b/t/t2081-parallel-checkout-collisions.sh
new file mode 100755
index 0000000000..f6fcfc0c1e
--- /dev/null
+++ b/t/t2081-parallel-checkout-collisions.sh
@@ -0,0 +1,162 @@
+#!/bin/sh
+
+test_description="path collisions during parallel checkout
+
+Parallel checkout must detect path collisions to:
+
+1) Avoid racily writing to different paths that represent the same file on disk.
+2) Report the colliding entries on clone.
+
+The tests in this file exercise parallel checkout's collision detection code in
+both these mechanics.
+"
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+TEST_ROOT="$PWD"
+
+test_expect_success CASE_INSENSITIVE_FS 'setup' '
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+	cat >objs <<-EOF &&
+	100644 $empty_oid	FILE_X
+	100644 $empty_oid	FILE_x
+	100644 $empty_oid	file_X
+	100644 $empty_oid	file_x
+	EOF
+	git update-index --index-info <objs &&
+	git commit -m "colliding files" &&
+	git tag basename_collision &&
+
+	write_script "$TEST_ROOT"/logger_script <<-\EOF
+	echo "$@" >>filter.log
+	EOF
+'
+
+test_workers_in_event_trace ()
+{
+	test $1 -eq $(grep ".event.:.child_start..*checkout--worker" $2 | wc -l)
+}
+
+test_expect_success CASE_INSENSITIVE_FS 'worker detects basename collision' '
+	GIT_TRACE2_EVENT="$(pwd)/trace" git \
+		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
+		checkout . &&
+
+	test_workers_in_event_trace 2 trace &&
+	collisions=$(grep -i "category.:.pcheckout.,.key.:.collision/basename.,.value.:.file_x.}" trace | wc -l) &&
+	test $collisions -eq 3
+'
+
+test_expect_success CASE_INSENSITIVE_FS 'worker detects dirname collision' '
+	test_config filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+
+	# By setting a filter command to "a", we make it ineligible for parallel
+	# checkout, and thus it is checked out *first*. This way we can ensure
+	# that "A/B" and "A/C" will both collide with the regular file "a".
+	#
+	attr_oid=$(echo "a filter=logger" | git hash-object -w --stdin) &&
+
+	cat >objs <<-EOF &&
+	100644 $empty_oid	A/B
+	100644 $empty_oid	A/C
+	100644 $empty_oid	a
+	100644 $attr_oid	.gitattributes
+	EOF
+	git rm -rf . &&
+	git update-index --index-info <objs &&
+
+	rm -f trace filter.log &&
+	GIT_TRACE2_EVENT="$(pwd)/trace" git \
+		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
+		checkout . &&
+
+	# Check that "a" (and only "a") was filtered
+	echo a >expected.log &&
+	test_cmp filter.log expected.log &&
+
+	# Check that it used the right number of workers and detected the collisions
+	test_workers_in_event_trace 2 trace &&
+	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/B.}" trace &&
+	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/C.}" trace
+'
+
+test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'do not follow symlinks colliding with leading dir' '
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+	symlink_oid=$(echo "./e" | git hash-object -w --stdin) &&
+	mkdir e &&
+
+	cat >objs <<-EOF &&
+	120000 $symlink_oid	D
+	100644 $empty_oid	d/x
+	100644 $empty_oid	e/y
+	EOF
+	git rm -rf . &&
+	git update-index --index-info <objs &&
+
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git checkout . &&
+	test_path_is_dir e &&
+	test_path_is_missing e/x
+'
+
+# The two following tests check that parallel checkout correctly reports
+# colliding entries on clone. The sequential code detects a collision by
+# calling lstat() before trying to open(O_CREAT) a file. (Note that this only
+# works for clone.) Then, to find the pair of a colliding item k, it searches
+# cache_entry[0, k-1]. This is not sufficient in parallel checkout because:
+#
+# - A colliding file may be created between the lstat() and open() calls;
+# - A colliding entry might appear in the second half of the cache_entry array.
+#
+test_expect_success CASE_INSENSITIVE_FS 'collision report on clone (w/ racy file creation)' '
+	git reset --hard basename_collision &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git clone . clone-repo 2>stderr &&
+
+	grep FILE_X stderr &&
+	grep FILE_x stderr &&
+	grep file_X stderr &&
+	grep file_x stderr &&
+	grep "the following paths have collided" stderr
+'
+
+# This test ensures that the collision report code is correctly looking for
+# colliding peers in the second half of the cache_entry array. This is done by
+# defining a smudge command for the *last* array entry, which makes it
+# non-eligible for parallel-checkout. Thus, it is checked out *first*, before
+# spawning the workers.
+#
+# Note: this test doesn't work on Windows because, on this system, the
+# collision report code uses strcmp() to find the colliding pairs when
+# core.ignoreCase is false. And we need this setting for this test so that only
+# 'file_x' matches the pattern of the filter attribute. But the test works on
+# OSX, where the colliding pairs are found using inode.
+#
+test_expect_success CASE_INSENSITIVE_FS,!MINGW,!CYGWIN \
+	'collision report on clone (w/ colliding peer after the detected entry)' '
+
+	test_config_global filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
+	git reset --hard basename_collision &&
+	echo "file_x filter=logger" >.gitattributes &&
+	git add .gitattributes &&
+	git commit -m "filter for file_x" &&
+
+	rm -rf clone-repo &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 \
+		git -c core.ignoreCase=false clone . clone-repo 2>stderr &&
+
+	grep FILE_X stderr &&
+	grep FILE_x stderr &&
+	grep file_X stderr &&
+	grep file_x stderr &&
+	grep "the following paths have collided" stderr &&
+
+	# Check that only "file_x" was filtered
+	echo file_x >expected.log &&
+	test_cmp clone-repo/filter.log expected.log
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 6/7] parallel-checkout: add tests related to .gitattributes
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
                   ` (4 preceding siblings ...)
  2021-04-22 15:17 ` [PATCH 5/7] parallel-checkout: add tests related to path collisions Matheus Tavares
@ 2021-04-22 15:17 ` Matheus Tavares
  2021-04-23 19:48   ` Derrick Stolee
  2021-04-22 15:17 ` [PATCH 7/7] ci: run test round with parallel-checkout enabled Matheus Tavares
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
  7 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git

Add tests to confirm that the `struct conv_attrs` data is correctly
passed from the main process to the workers, and that they can properly
convert the blobs before writing them to the working tree.

Also check that parallel-ineligible entries, such as regular files that
require external filters, are correctly smudge and written when
parallel-checkout is enabled.

Note: to avoid repeating code, some helper functions are extracted from
t0028 into a common lib file.

Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-encoding.sh                       |  25 +++
 t/t0028-working-tree-encoding.sh        |  25 +--
 t/t2082-parallel-checkout-attributes.sh | 194 ++++++++++++++++++++++++
 3 files changed, 220 insertions(+), 24 deletions(-)
 create mode 100644 t/lib-encoding.sh
 create mode 100755 t/t2082-parallel-checkout-attributes.sh

diff --git a/t/lib-encoding.sh b/t/lib-encoding.sh
new file mode 100644
index 0000000000..c52ffbbed5
--- /dev/null
+++ b/t/lib-encoding.sh
@@ -0,0 +1,25 @@
+# Encoding helpers used by t0028 and t2082
+
+test_lazy_prereq NO_UTF16_BOM '
+	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
+'
+
+test_lazy_prereq NO_UTF32_BOM '
+	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
+'
+
+write_utf16 () {
+	if test_have_prereq NO_UTF16_BOM
+	then
+		printf '\376\377'
+	fi &&
+	iconv -f UTF-8 -t UTF-16
+}
+
+write_utf32 () {
+	if test_have_prereq NO_UTF32_BOM
+	then
+		printf '\0\0\376\377'
+	fi &&
+	iconv -f UTF-8 -t UTF-32
+}
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index f970a9806b..82905a2156 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -6,33 +6,10 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY/lib-encoding.sh"
 
 GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
 
-test_lazy_prereq NO_UTF16_BOM '
-	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
-'
-
-test_lazy_prereq NO_UTF32_BOM '
-	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
-'
-
-write_utf16 () {
-	if test_have_prereq NO_UTF16_BOM
-	then
-		printf '\376\377'
-	fi &&
-	iconv -f UTF-8 -t UTF-16
-}
-
-write_utf32 () {
-	if test_have_prereq NO_UTF32_BOM
-	then
-		printf '\0\0\376\377'
-	fi &&
-	iconv -f UTF-8 -t UTF-32
-}
-
 test_expect_success 'setup test files' '
 	git config core.eol lf &&
 
diff --git a/t/t2082-parallel-checkout-attributes.sh b/t/t2082-parallel-checkout-attributes.sh
new file mode 100755
index 0000000000..2525457961
--- /dev/null
+++ b/t/t2082-parallel-checkout-attributes.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+
+test_description='parallel-checkout: attributes
+
+Verify that parallel-checkout correctly creates files that require
+conversions, as specified in .gitattributes. The main point here is
+to check that the conv_attr data is correctly sent to the workers
+and that it contains sufficient information to smudge files
+properly (without access to the index or attribute stack).
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+. "$TEST_DIRECTORY/lib-encoding.sh"
+
+test_expect_success 'parallel-checkout with ident' '
+	set_checkout_config 2 0 &&
+	git init ident &&
+	(
+		cd ident &&
+		echo "A ident" >.gitattributes &&
+		echo "\$Id\$" >A &&
+		echo "\$Id\$" >B &&
+		git add -A &&
+		git commit -m id &&
+
+		rm A B &&
+		test_checkout_workers 2 git reset --hard &&
+		hexsz=$(test_oid hexsz) &&
+		grep -E "\\\$Id: [0-9a-f]{$hexsz} \\\$" A &&
+		grep "\\\$Id\\\$" B
+	)
+'
+
+test_expect_success 'parallel-checkout with re-encoding' '
+	set_checkout_config 2 0 &&
+	git init encoding &&
+	(
+		cd encoding &&
+		echo text >utf8-text &&
+		write_utf16 <utf8-text >utf16-text &&
+
+		echo "A working-tree-encoding=UTF-16" >.gitattributes &&
+		cp utf16-text A &&
+		cp utf8-text B &&
+		git add A B .gitattributes &&
+		git commit -m encoding &&
+
+		# Check that A is stored in UTF-8
+		git cat-file -p :A >A.internal &&
+		test_cmp_bin utf8-text A.internal &&
+
+		rm A B &&
+		test_checkout_workers 2 git checkout A B &&
+
+		# Check that A (and only A) is re-encoded during checkout
+		test_cmp_bin utf16-text A &&
+		test_cmp_bin utf8-text B
+	)
+'
+
+test_expect_success 'parallel-checkout with eol conversions' '
+	set_checkout_config 2 0 &&
+	git init eol &&
+	(
+		cd eol &&
+		printf "multi\r\nline\r\ntext" >crlf-text &&
+		printf "multi\nline\ntext" >lf-text &&
+
+		git config core.autocrlf false &&
+		echo "A eol=crlf" >.gitattributes &&
+		cp crlf-text A &&
+		cp lf-text B &&
+		git add A B .gitattributes &&
+		git commit -m eol &&
+
+		# Check that A is stored with LF format
+		git cat-file -p :A >A.internal &&
+		test_cmp_bin lf-text A.internal &&
+
+		rm A B &&
+		test_checkout_workers 2 git checkout A B &&
+
+		# Check that A (and only A) is converted to CRLF during checkout
+		test_cmp_bin crlf-text A &&
+		test_cmp_bin lf-text B
+	)
+'
+
+# Entries that require an external filter are not eligible for parallel
+# checkout. Check that both the parallel-eligible and non-eligible entries are
+# properly writen in a single checkout operation.
+#
+test_expect_success 'parallel-checkout and external filter' '
+	set_checkout_config 2 0 &&
+	git init filter &&
+	(
+		cd filter &&
+		write_script <<-\EOF rot13.sh &&
+		tr \
+		  "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" \
+		  "nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM"
+		EOF
+
+		git config filter.rot13.clean "\"$(pwd)/rot13.sh\"" &&
+		git config filter.rot13.smudge "\"$(pwd)/rot13.sh\"" &&
+		git config filter.rot13.required true &&
+
+		echo abcd >original &&
+		echo nopq >rot13 &&
+
+		echo "A filter=rot13" >.gitattributes &&
+		cp original A &&
+		cp original B &&
+		cp original C &&
+		git add A B C .gitattributes &&
+		git commit -m filter &&
+
+		# Check that A (and only A) was cleaned
+		git cat-file -p :A >A.internal &&
+		test_cmp rot13 A.internal &&
+		git cat-file -p :B >B.internal &&
+		test_cmp original B.internal &&
+		git cat-file -p :C >C.internal &&
+		test_cmp original C.internal &&
+
+		rm A B C *.internal &&
+		test_checkout_workers 2 git checkout A B C &&
+
+		# Check that A (and only A) was smudged during checkout
+		test_cmp original A &&
+		test_cmp original B &&
+		test_cmp original C
+	)
+'
+
+# The delayed queue is independent from the parallel queue, and they should be
+# able to work together in the same checkout process.
+#
+test_expect_success PERL 'parallel-checkout and delayed checkout' '
+	write_script rot13-filter.pl "$PERL_PATH" \
+		<"$TEST_DIRECTORY"/t0021/rot13-filter.pl &&
+
+	test_config_global filter.delay.process \
+		"\"$(pwd)/rot13-filter.pl\" --always-delay \"$(pwd)/delayed.log\" clean smudge delay" &&
+	test_config_global filter.delay.required true &&
+
+	echo "abcd" >original &&
+	echo "nopq" >rot13 &&
+
+	git init delayed &&
+	(
+		cd delayed &&
+		echo "*.d filter=delay" >.gitattributes &&
+		cp ../original W.d &&
+		cp ../original X.d &&
+		cp ../original Y &&
+		cp ../original Z &&
+		git add -A &&
+		git commit -m delayed &&
+
+		# Check that *.d files were cleaned
+		git cat-file -p :W.d >W.d.internal &&
+		test_cmp W.d.internal ../rot13 &&
+		git cat-file -p :X.d >X.d.internal &&
+		test_cmp X.d.internal ../rot13 &&
+		git cat-file -p :Y >Y.internal &&
+		test_cmp Y.internal ../original &&
+		git cat-file -p :Z >Z.internal &&
+		test_cmp Z.internal ../original &&
+
+		rm *
+	) &&
+
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git -C delayed checkout -f &&
+	verify_checkout delayed &&
+
+	# Check that the *.d files got to the delay queue and were filtered
+	grep "smudge W.d .* \[DELAYED\]" delayed.log &&
+	grep "smudge X.d .* \[DELAYED\]" delayed.log &&
+	test_cmp delayed/W.d original &&
+	test_cmp delayed/X.d original &&
+
+	# Check that the parallel-eligible entries went to the right queue and
+	# were not filtered
+	! grep "smudge Y .* \[DELAYED\]" delayed.log &&
+	! grep "smudge Z .* \[DELAYED\]" delayed.log &&
+	test_cmp delayed/Y original &&
+	test_cmp delayed/Z original
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 7/7] ci: run test round with parallel-checkout enabled
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
                   ` (5 preceding siblings ...)
  2021-04-22 15:17 ` [PATCH 6/7] parallel-checkout: add tests related to .gitattributes Matheus Tavares
@ 2021-04-22 15:17 ` Matheus Tavares
  2021-04-23 19:56   ` Derrick Stolee
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
  7 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-22 15:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git

We already have tests for the basic parallel-checkout operations. But
this code can also run be executed by other commands, such as
git-read-tree and git-sparse-checkout, which are currently not tested
with multiple workers. To promote a wider test coverage without
duplicating tests:

1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally
   force parallel-checkout execution during the whole test suite.

2. Set this variable (with a value of 2) in the second test round of our
   linux-gcc CI job. This round runs `make test` again with some
   optional GIT_TEST_* variables enabled, so there is no additional
   overhead in exercising the parallel-checkout code here.

Note that tests checking out less than two parallel-eligible entries
will fall back to the sequential mode. Nevertheless, it's still a good
exercise for the parallel-checkout framework as the fallback codepath
also writes the queued entries using the parallel-checkout functions
(only without spawning any worker).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 ci/run-build-and-tests.sh  |  1 +
 parallel-checkout.c        | 14 ++++++++++++++
 t/README                   |  4 ++++
 t/lib-parallel-checkout.sh |  3 +++
 4 files changed, 22 insertions(+)

diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index a66b5e8c75..23b28e7391 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,6 +25,7 @@ linux-gcc)
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
+	export GIT_TEST_CHECKOUT_WORKERS=2
 	make test
 	;;
 linux-clang)
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 6fb3f1e6c9..6b1af32bb3 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -35,6 +35,20 @@ static const int DEFAULT_NUM_WORKERS = 1;
 
 void get_parallel_checkout_configs(int *num_workers, int *threshold)
 {
+	char *env_workers = getenv("GIT_TEST_CHECKOUT_WORKERS");
+
+	if (env_workers && *env_workers) {
+		if (strtol_i(env_workers, 10, num_workers)) {
+			die("invalid value for GIT_TEST_CHECKOUT_WORKERS: '%s'",
+			    env_workers);
+		}
+		if (*num_workers < 1)
+			*num_workers = online_cpus();
+
+		*threshold = 0;
+		return;
+	}
+
 	if (git_config_get_int("checkout.workers", num_workers))
 		*num_workers = DEFAULT_NUM_WORKERS;
 	else if (*num_workers < 1)
diff --git a/t/README b/t/README
index fd9375b146..a194488f27 100644
--- a/t/README
+++ b/t/README
@@ -436,6 +436,10 @@ and "sha256".
 GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
 'pack.writeReverseIndex' setting.
 
+GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
+to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
+execution of the parallel-checkout code.
+
 Naming Tests
 ------------
 
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index 16ee18389b..0921a32b96 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -1,5 +1,8 @@
 # Helpers for t208* tests
 
+# Parallel checkout tests need full control of the number of workers
+unset GIT_TEST_CHECKOUT_WORKERS
+
 set_checkout_config () {
 	if test $# -ne 2
 	then
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/7] builtin/checkout.c: complete parallel checkout support
  2021-04-22 15:17 ` [PATCH 2/7] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2021-04-23 16:19   ` Derrick Stolee
  2021-04-26 21:54     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 47+ messages in thread
From: Derrick Stolee @ 2021-04-23 16:19 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: christian.couder, git

On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> There is one code path in builtin/checkout.c which still doesn't benefit
> from parallel checkout because it calls checkout_entry() directly,> instead of unpack_trees(). Let's add parallel checkout support for this
> missing spot as well.

I couldn't tell immediately from the patch what would trigger this
code path. I had to trace the method calls to discover that it is
for the case of a pathspec-limited checkout:

	git checkout <ref> -- <pathspec>

I confirmed that this does work with this change, but it might be
nice to have a test that verifies that parallelism is triggering for
this case.

Looking ahead to patches 4-6, which add tests, I do not see one for this
code path. Yes, patch 7 will implicitly test it through optional
settings, but it would be nice to verify that the code is actually using
parallel workers. The test_checkout_workers helper in patch 4 should be
helpful for this effort.

Please point out the test that covers this case, in case I'm just not
seeing it.

The good news is that I can see a difference. By alternating checkouts
of the Git repository's "t" directory between v2.20 and v2.31.1, I can
see these results for varying numbers of workers:

Benchmark #1: 16 workers
  Time (mean ± σ):     108.6 ms ±   5.2 ms    [User: 146.1 ms, System: 146.1 ms]
  Range (min … max):    95.5 ms … 124.9 ms    100 runs
 
Benchmark #2: 8 workers
  Time (mean ± σ):     104.8 ms ±   4.8 ms    [User: 128.3 ms, System: 131.7 ms]
  Range (min … max):    94.2 ms … 119.0 ms    100 runs
 
Benchmark #3: 4 workers
  Time (mean ± σ):     112.3 ms ±   6.2 ms    [User: 114.6 ms, System: 112.1 ms]
  Range (min … max):   100.0 ms … 127.4 ms    100 runs
 
Benchmark #4: 2 workers
  Time (mean ± σ):     124.2 ms ±   4.2 ms    [User: 106.5 ms, System: 102.0 ms]
  Range (min … max):   114.8 ms … 136.3 ms    100 runs
 
Benchmark #5: sequential
  Time (mean ± σ):     154.6 ms ±   6.7 ms    [User: 83.5 ms, System: 79.4 ms]
  Range (min … max):   142.1 ms … 176.0 ms    100 runs
 
Summary
  '8 workers' ran
    1.04 ± 0.07 times faster than '16 workers'
    1.07 ± 0.08 times faster than '4 workers'
    1.19 ± 0.07 times faster than '2 workers'
    1.48 ± 0.09 times faster than 'sequential'

(Note: these time measurements are for the round-trip of two checkout
commands.)
> @@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
>  	int nr_checkouts = 0, nr_unmerged = 0;
>  	int errs = 0;
>  	int pos;
> +	int pc_workers, pc_threshold;
> +	struct mem_pool ce_mem_pool;
>  
>  	state.force = 1;
>  	state.refresh_cache = 1;
>  	state.istate = &the_index;
>  
> +	mem_pool_init(&ce_mem_pool, 0);
> +	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
>  	init_checkout_metadata(&state.meta, info->refname,
>  			       info->commit ? &info->commit->object.oid : &info->oid,
>  			       NULL);
>  
>  	enable_delayed_checkout(&state);
> +	if (pc_workers > 1)
> +		init_parallel_checkout();

I'm late to looking at your parallel checkout work, but I find this
to be a really nice API to get things initialized.


>  			else if (opts->merge)
>  				errs |= checkout_merged(pos, &state,
> -							&nr_unmerged);
> +							&nr_unmerged,
> +							&ce_mem_pool);

I see the need to populate these merged entries in the pool for now,
before...

>  			pos = skip_same_name(ce, pos) - 1;
>  		}
>  	}
> +	if (pc_workers > 1)
> +		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
> +					      NULL, NULL);
> +	mem_pool_discard(&ce_mem_pool, should_validate_cache_entries());
...now running the parallel checkout and discarding the entries.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/7] checkout-index: add parallel checkout support
  2021-04-22 15:17 ` [PATCH 3/7] checkout-index: add " Matheus Tavares
@ 2021-04-23 18:32   ` Derrick Stolee
  2021-04-26 22:30     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 47+ messages in thread
From: Derrick Stolee @ 2021-04-23 18:32 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: christian.couder, git

On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> @@ -142,11 +143,7 @@ static void checkout_all(const char *prefix, int prefix_length)
>  	}
>  	if (last_ce && to_tempfile)
>  		write_tempfile_record(last_ce->name, prefix);
> -	if (errs)
> -		/* we have already done our error reporting.
> -		 * exit with the same code as die().
> -		 */
> -		exit(128);

Here, it makes note of returning 128 as if it were a die(), but

> +	if (all)
> +		err |= checkout_all(prefix, prefix_length);
> +
> +	if (pc_workers > 1)
> +		err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
> +					     NULL, NULL);
> +
>  	if (err)
>  		return 1;

This returns 1 instead. Should we `return err` and use an error
code specific to the response? I imagine there are other reasons
that could cause a non-zero return for checkout_all() and
run_parallel_checkout().

I suppose: is there a value in persisting the 128 response code
here, or is an exit of 1 sufficient? There is no documentation
about the exit code, so any dependence on 128 is not based on a
"written promise" but I thought it was worth mentioning.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/7] parallel-checkout: add tests for basic operations
  2021-04-22 15:17 ` [PATCH 4/7] parallel-checkout: add tests for basic operations Matheus Tavares
@ 2021-04-23 19:18   ` Derrick Stolee
  2021-04-27  2:30     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 47+ messages in thread
From: Derrick Stolee @ 2021-04-23 19:18 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: christian.couder, git, Jonathan Nieder

On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> Add tests to populate the working tree during clone and checkout using
> sequential and parallel mode, to confirm that they produce identical
> results. Also test basic checkout mechanics, such as checking for
> symlinks in the leading directories and the abidance to --force.
> 
> Note: some helper functions are added to a common lib file which is only
> included by t2080 for now. But they will also be used by other
> parallel-checkout tests in the following patches.
> 
> Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>

Is this a standard thing? Or, did you change the patch significantly
enough that "Co-authored-by:" is no longer appropriate?

> +++ b/t/lib-parallel-checkout.sh
> @@ -0,0 +1,37 @@
> +# Helpers for t208* tests

I could see tests outside of the t208* range possibly having
value in interacting with parallel checkout in the future.

Perhaps:

	# Helpers for tests invoking parallel-checkout

> +
> +set_checkout_config () {
> +	if test $# -ne 2
> +	then
> +		BUG "set_checkout_config() requires two arguments"
> +	fi &&

A usage comment is typically helpful for these helpers:

# set_checkout_config <workers> <threshold>
# sets global config values to use the given number of
# workers when the given threshold is met.

> +
> +	test_config_global checkout.workers $1 &&
> +	test_config_global checkout.thresholdForParallelism $2
> +}
> +
> +# Run "${@:2}" and check that $1 checkout workers were used
> +test_checkout_workers () {

This simpler doc style works, too.

> +	if test $# -lt 2
> +	then
> +		BUG "too few arguments to test_checkout_workers()"
> +	fi &&
> +
> +	expected_workers=$1 &&
> +	shift &&
> +
> +	rm -f trace &&
> +	GIT_TRACE2="$(pwd)/trace" "$@" &&
> +
> +	workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
> +	test $workers -eq $expected_workers &&
> +	rm -f trace

I wonder if this should be a "test_when_finished rm -f trace" being
recorded earlier in the step. It would also benefit from using a more
specific name than "trace". Something like "trace-test-checkout-workers"
would be unlikely to collide with someone else's trace.

> +# Verify that both the working tree and the index were created correctly
> +verify_checkout () {

Add usage of a repo in $1?

> +	git -C "$1" diff-index --quiet HEAD -- &&
> +	git -C "$1" diff-index --quiet --cached HEAD -- &&
> +	git -C "$1" status --porcelain >"$1".status &&
> +	test_must_be_empty "$1".status
> +}


> +TEST_NO_CREATE_REPO=1
> +. ./test-lib.sh
> +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
> +
> +# Test parallel-checkout with a branch switch containing file creations,
> +# deletions, and modification; with different entry types. Switching from B1 to
> +# B2 will have the following changes:
> +#
> +# - a (file):      modified
> +# - e/x (file):    deleted
> +# - b (symlink):   deleted
> +# - b/f (file):    created
> +# - e (symlink):   created
> +# - d (submodule): created
> +#

An interesting set of changes. What about a directory/file conflict?

Something like this might be useful:

 # - f/t (file):   deleted
 # - f (file):     created

in fact, it could be interesting to have a file conflict with each
of these types, such as the symlink 'e' and the submodule 'd'.

While we are at it, what about a symlink/submodule conflict? I know
it makes the test bigger, but doing everything simultaneously through
a carefully designed repository helps prevent test case bloat.

> +test_expect_success SYMLINKS 'setup repo for checkout with various types of changes' '

Could we split this setup step into two parts? Once could
set up everything except the symlinks and would not require
the SYMLINKS prereq. We could then have another test with
the SYMLINKS prereq that extends B1 and B2 to have symlinks
and their conflicts. The remaining tests would work on any
platform without needing the SYMLINKS prereq.

> +test_expect_success SYMLINKS 'sequential checkout' '
> +	cp -R various various_sequential &&
> +	set_checkout_config 1 0 &&
> +	test_checkout_workers 0 \
> +		git -C various_sequential checkout --recurse-submodules B2 &&
> +	verify_checkout various_sequential
> +'

I see all these tests are very similar. Perhaps group
them to demonstrate their differences?

parallel_test_case () {
	test_expect_success "$1" "
		cp -R various $2 &&
		set_checkout_config $3 $4 &&
		test_checkout_workers $5 \
			git -C $2 checkout --recurse-submodules B2 &&
		verify_checkout $2
	"
}

parallel_test_case 'sequential checkout' \
	various_sequential 1 0 0
parallel_test_case 'parallel checkout' \
	various_parallel 2 0 2
parallel_test_case 'fallback to sequential checkout (threshold)' \
	various_sequential_fallback 2 100 0

> +test_expect_success SYMLINKS 'parallel checkout on clone' '
> +	git -C various checkout --recurse-submodules B2 &&
> +	set_checkout_config 2 0 &&
> +	test_checkout_workers 2 \
> +		git clone --recurse-submodules various various_parallel_clone &&
> +	verify_checkout various_parallel_clone
> +'
> +
> +test_expect_success SYMLINKS 'fallback to sequential checkout on clone (threshold)' '
> +	git -C various checkout --recurse-submodules B2 &&
> +	set_checkout_config 2 100 &&
> +	test_checkout_workers 0 \
> +		git clone --recurse-submodules various various_sequential_fallback_clone &&
> +	verify_checkout various_sequential_fallback_clone
> +'

Doing a similar grouping for the clone case might be interesting,
if only for the possible future where more customization might be
necessary.

Since the clone case is only caring about the contents at B2, it
is good that B2 contains one of each type of entry.

> +# Just to be paranoid, actually compare the working trees' contents directly.
> +test_expect_success SYMLINKS 'compare the working trees' '
> +	rm -rf various_*/.git &&
> +	rm -rf various_*/d/.git &&
> +
> +	diff -r various_sequential various_parallel &&
> +	diff -r various_sequential various_sequential_fallback &&
> +	diff -r various_sequential various_parallel_clone &&
> +	diff -r various_sequential various_sequential_fallback_clone
> +'
> +
> +test_expect_success 'parallel checkout respects --[no]-force' '
> +	set_checkout_config 2 0 &&
> +	git init dirty &&
> +	(
> +		cd dirty &&
> +		mkdir D &&
> +		test_commit D/F &&
> +		test_commit F &&
> +
> +		rm -rf D &&
> +		echo changed >D &&
> +		echo changed >F.t &&
> +
> +		# We expect 0 workers because there is nothing to be done
> +		test_checkout_workers 0 git checkout HEAD &&
> +		test_path_is_file D &&
> +		grep changed D &&
> +		grep changed F.t &&
> +
> +		test_checkout_workers 2 git checkout --force HEAD &&
> +		test_path_is_dir D &&
> +		grep D/F D/F.t &&
> +		grep F F.t
> +	)
> +'

I see SYMLINKS is not necessary here due to creating a new repo.
Still better to not have the prereq when not necessary.

> +test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
> +	set_checkout_config 2 0 &&
> +	git init symlinks &&
> +	(
> +		cd symlinks &&
> +		mkdir D untracked &&
> +		# Commit 2 files to have enough work for 2 parallel workers
> +		test_commit D/A &&
> +		test_commit D/B &&
> +		rm -rf D &&
> +		ln -s untracked D &&
> +
> +		test_checkout_workers 2 git checkout --force HEAD &&
> +		! test -h D &&
> +		grep D/A D/A.t &&
> +		grep D/B D/B.t
> +	)
> +'

This test can continue to require SYMLINKS.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/7] parallel-checkout: add tests related to .gitattributes
  2021-04-22 15:17 ` [PATCH 6/7] parallel-checkout: add tests related to .gitattributes Matheus Tavares
@ 2021-04-23 19:48   ` Derrick Stolee
  0 siblings, 0 replies; 47+ messages in thread
From: Derrick Stolee @ 2021-04-23 19:48 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: christian.couder, git

On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> Add tests to confirm that the `struct conv_attrs` data is correctly
> passed from the main process to the workers, and that they can properly
> convert the blobs before writing them to the working tree.
> 
> Also check that parallel-ineligible entries, such as regular files that
> require external filters, are correctly smudge and written when
> parallel-checkout is enabled.
> 
> Note: to avoid repeating code, some helper functions are extracted from
> t0028 into a common lib file.

nit: this movement of helper functions could be its own patch to reduce
cognitive load when already reading a very detailed set of tests.

> diff --git a/t/t2082-parallel-checkout-attributes.sh b/t/t2082-parallel-checkout-attributes.sh
> new file mode 100755
> index 0000000000..2525457961
> --- /dev/null
> +++ b/t/t2082-parallel-checkout-attributes.sh
> @@ -0,0 +1,194 @@
> +#!/bin/sh
> +
> +test_description='parallel-checkout: attributes
> +
> +Verify that parallel-checkout correctly creates files that require
> +conversions, as specified in .gitattributes. The main point here is
> +to check that the conv_attr data is correctly sent to the workers
> +and that it contains sufficient information to smudge files
> +properly (without access to the index or attribute stack).
> +'

The tests themselves are quite dense, but definitely appear to be
doing valuable cases to cover an important feature of parallel
checkout. Each one is so different from the other that I can't
find ways to reorganize them for better readability.

Thanks,
-Stolee
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 7/7] ci: run test round with parallel-checkout enabled
  2021-04-22 15:17 ` [PATCH 7/7] ci: run test round with parallel-checkout enabled Matheus Tavares
@ 2021-04-23 19:56   ` Derrick Stolee
  0 siblings, 0 replies; 47+ messages in thread
From: Derrick Stolee @ 2021-04-23 19:56 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: christian.couder, git

On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> We already have tests for the basic parallel-checkout operations. But
> this code can also run be executed by other commands, such as
> git-read-tree and git-sparse-checkout, which are currently not tested
> with multiple workers. To promote a wider test coverage without
> duplicating tests:
> 
> 1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally
>    force parallel-checkout execution during the whole test suite.
> 
> 2. Set this variable (with a value of 2) in the second test round of our
>    linux-gcc CI job. This round runs `make test` again with some
>    optional GIT_TEST_* variables enabled, so there is no additional
>    overhead in exercising the parallel-checkout code here.
> 
> Note that tests checking out less than two parallel-eligible entries
> will fall back to the sequential mode. Nevertheless, it's still a good
> exercise for the parallel-checkout framework as the fallback codepath
> also writes the queued entries using the parallel-checkout functions
> (only without spawning any worker).

I agree that this is helpful additional testing.

> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  ci/run-build-and-tests.sh  |  1 +
>  parallel-checkout.c        | 14 ++++++++++++++
>  t/README                   |  4 ++++
>  t/lib-parallel-checkout.sh |  3 +++
>  4 files changed, 22 insertions(+)
> 
> diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
> index a66b5e8c75..23b28e7391 100755
> --- a/ci/run-build-and-tests.sh
> +++ b/ci/run-build-and-tests.sh
> @@ -25,6 +25,7 @@ linux-gcc)
>  	export GIT_TEST_ADD_I_USE_BUILTIN=1
>  	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
>  	export GIT_TEST_WRITE_REV_INDEX=1
> +	export GIT_TEST_CHECKOUT_WORKERS=2
>  	make test
>  	;;

(Making a note to audit the entries here, as I believe some
of these could be enabled by default or no longer do anything.)

>  void get_parallel_checkout_configs(int *num_workers, int *threshold)
>  {
> +	char *env_workers = getenv("GIT_TEST_CHECKOUT_WORKERS");
> +
> +	if (env_workers && *env_workers) {
> +		if (strtol_i(env_workers, 10, num_workers)) {

I had to look up how strtol_i() differs from strtol(). It seems
like we should use strtol_i() more often than we do.

> +			die("invalid value for GIT_TEST_CHECKOUT_WORKERS: '%s'",
> +			    env_workers);
> +		}
> +		if (*num_workers < 1)
> +			*num_workers = online_cpus();
> +
> +		*threshold = 0;
> +		return;
> +	}

LGTM.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/7] builtin/checkout.c: complete parallel checkout support
  2021-04-23 16:19   ` Derrick Stolee
@ 2021-04-26 21:54     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-26 21:54 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Christian Couder, Jeff Hostetler

On Fri, Apr 23, 2021 at 1:19 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> > There is one code path in builtin/checkout.c which still doesn't benefit
> > from parallel checkout because it calls checkout_entry() directly,> instead of unpack_trees(). Let's add parallel checkout support for this
> > missing spot as well.
>
> I couldn't tell immediately from the patch what would trigger this
> code path. I had to trace the method calls to discover that it is
> for the case of a pathspec-limited checkout:
>
>         git checkout <ref> -- <pathspec>

Oops, I should have mentioned that in the commit message. Thanks for
pointing it out.

> I confirmed that this does work with this change, but it might be
> nice to have a test that verifies that parallelism is triggering for
> this case.
>
> Looking ahead to patches 4-6, which add tests, I do not see one for this
> code path. Yes, patch 7 will implicitly test it through optional
> settings, but it would be nice to verify that the code is actually using
> parallel workers. The test_checkout_workers helper in patch 4 should be
> helpful for this effort.
>
> Please point out the test that covers this case, in case I'm just not
> seeing it.

Hmm, there are some tests at t2081 and t2082 that check the
pathspec-limited case with parallel workers. For example the collision
tests run `test_checkout_workers 2 git checkout .`. We also test
direct pathnames in t2082, using `test_checkout_workers 2 git checkout
A B`.

> The good news is that I can see a difference. By alternating checkouts
> of the Git repository's "t" directory between v2.20 and v2.31.1, I can
> see these results for varying numbers of workers:
>
> Benchmark #1: 16 workers
>   Time (mean ± σ):     108.6 ms ±   5.2 ms    [User: 146.1 ms, System: 146.1 ms]
>   Range (min … max):    95.5 ms … 124.9 ms    100 runs
>
> Benchmark #2: 8 workers
>   Time (mean ± σ):     104.8 ms ±   4.8 ms    [User: 128.3 ms, System: 131.7 ms]
>   Range (min … max):    94.2 ms … 119.0 ms    100 runs
>
> Benchmark #3: 4 workers
>   Time (mean ± σ):     112.3 ms ±   6.2 ms    [User: 114.6 ms, System: 112.1 ms]
>   Range (min … max):   100.0 ms … 127.4 ms    100 runs
>
> Benchmark #4: 2 workers
>   Time (mean ± σ):     124.2 ms ±   4.2 ms    [User: 106.5 ms, System: 102.0 ms]
>   Range (min … max):   114.8 ms … 136.3 ms    100 runs
>
> Benchmark #5: sequential
>   Time (mean ± σ):     154.6 ms ±   6.7 ms    [User: 83.5 ms, System: 79.4 ms]
>   Range (min … max):   142.1 ms … 176.0 ms    100 runs
>
> Summary
>   '8 workers' ran
>     1.04 ± 0.07 times faster than '16 workers'
>     1.07 ± 0.08 times faster than '4 workers'
>     1.19 ± 0.07 times faster than '2 workers'
>     1.48 ± 0.09 times faster than 'sequential'

Nice! Thanks for the benchmark!

> (Note: these time measurements are for the round-trip of two checkout
> commands.)
> > @@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
> >       int nr_checkouts = 0, nr_unmerged = 0;
> >       int errs = 0;
> >       int pos;
> > +     int pc_workers, pc_threshold;
> > +     struct mem_pool ce_mem_pool;
> >
> >       state.force = 1;
> >       state.refresh_cache = 1;
> >       state.istate = &the_index;
> >
> > +     mem_pool_init(&ce_mem_pool, 0);
> > +     get_parallel_checkout_configs(&pc_workers, &pc_threshold);
> >       init_checkout_metadata(&state.meta, info->refname,
> >                              info->commit ? &info->commit->object.oid : &info->oid,
> >                              NULL);
> >
> >       enable_delayed_checkout(&state);
> > +     if (pc_workers > 1)
> > +             init_parallel_checkout();
>
> I'm late to looking at your parallel checkout work, but I find this
> to be a really nice API to get things initialized.

Thanks :)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/7] checkout-index: add parallel checkout support
  2021-04-23 18:32   ` Derrick Stolee
@ 2021-04-26 22:30     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-26 22:30 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Christian Couder, Jeff Hostetler

On Fri, Apr 23, 2021 at 3:32 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> > @@ -142,11 +143,7 @@ static void checkout_all(const char *prefix, int prefix_length)
> >       }
> >       if (last_ce && to_tempfile)
> >               write_tempfile_record(last_ce->name, prefix);
> > -     if (errs)
> > -             /* we have already done our error reporting.
> > -              * exit with the same code as die().
> > -              */
> > -             exit(128);
>
> Here, it makes note of returning 128 as if it were a die(), but
>
> > +     if (all)
> > +             err |= checkout_all(prefix, prefix_length);
> > +
> > +     if (pc_workers > 1)
> > +             err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
> > +                                          NULL, NULL);
> > +
> >       if (err)
> >               return 1;
>
> This returns 1 instead. Should we `return err` and use an error
> code specific to the response? I imagine there are other reasons
> that could cause a non-zero return for checkout_all() and
> run_parallel_checkout().

Hmm, I think checkout_entry() returns -1 for any kind of error, and
checkout_all()'s `err` only holds the number of checkout_entry()'s
failures. run_parallel_checkout() also always use -1 for errors.

> I suppose: is there a value in persisting the 128 response code
> here, or is an exit of 1 sufficient? There is no documentation
> about the exit code, so any dependence on 128 is not based on a
> "written promise" but I thought it was worth mentioning.

Yeah, I also wondered if we should keep the 128 exit code here... But
I couldn't see much value in doing so, unless there are scripts
actually expecting 128. But I guess this is unlikely, right?

The additional bonus of unifying the exit path for `git checkout-index
--all` and `git checkout-index <path>` is that the code flow becomes a
little easier to read, especially now that they both need to
[conditionally] call run_parallel_checkout() before exiting.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/7] parallel-checkout: add tests for basic operations
  2021-04-23 19:18   ` Derrick Stolee
@ 2021-04-27  2:30     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-27  2:30 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Christian Couder, Jeff Hostetler, Jonathan Nieder

On Fri, Apr 23, 2021 at 4:18 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> > Add tests to populate the working tree during clone and checkout using
> > sequential and parallel mode, to confirm that they produce identical
> > results. Also test basic checkout mechanics, such as checking for
> > symlinks in the leading directories and the abidance to --force.
> >
> > Note: some helper functions are added to a common lib file which is only
> > included by t2080 for now. But they will also be used by other
> > parallel-checkout tests in the following patches.
> >
> > Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
>
> Is this a standard thing? Or, did you change the patch significantly
> enough that "Co-authored-by:" is no longer appropriate?

No, I think "Co-authored-by" is appropriate, let's change this trailer :)

> > +++ b/t/lib-parallel-checkout.sh
> > @@ -0,0 +1,37 @@
> > +# Helpers for t208* tests
>
> I could see tests outside of the t208* range possibly having
> value in interacting with parallel checkout in the future.
>
> Perhaps:
>
>         # Helpers for tests invoking parallel-checkout

Will do!

> > +
> > +set_checkout_config () {
> > +     if test $# -ne 2
> > +     then
> > +             BUG "set_checkout_config() requires two arguments"
> > +     fi &&
>
> A usage comment is typically helpful for these helpers:
>
> # set_checkout_config <workers> <threshold>
> # sets global config values to use the given number of
> # workers when the given threshold is met.

OK, I'll change that.

> > +
> > +     test_config_global checkout.workers $1 &&
> > +     test_config_global checkout.thresholdForParallelism $2
> > +}
> > +
> > +# Run "${@:2}" and check that $1 checkout workers were used
> > +test_checkout_workers () {
>
> This simpler doc style works, too.
>
> > +     if test $# -lt 2
> > +     then
> > +             BUG "too few arguments to test_checkout_workers()"
> > +     fi &&
> > +
> > +     expected_workers=$1 &&
> > +     shift &&
> > +
> > +     rm -f trace &&
> > +     GIT_TRACE2="$(pwd)/trace" "$@" &&
> > +
> > +     workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
> > +     test $workers -eq $expected_workers &&
> > +     rm -f trace
>
> I wonder if this should be a "test_when_finished rm -f trace" being
> recorded earlier in the step.It would also benefit from using a more
> specific name than "trace". Something like "trace-test-checkout-workers"
> would be unlikely to collide with someone else's trace.

Good idea, I'll change the trace file name.

Unfortunately, I think we won't be able to use `test_when_finished`
here as this function is sometimes called inside subshells, and
`test_when_finished` doesn't work in this situation :(

> > +# Verify that both the working tree and the index were created correctly
> > +verify_checkout () {
>
> Add usage of a repo in $1?

Sure!

> > +     git -C "$1" diff-index --quiet HEAD -- &&
> > +     git -C "$1" diff-index --quiet --cached HEAD -- &&
> > +     git -C "$1" status --porcelain >"$1".status &&
> > +     test_must_be_empty "$1".status
> > +}
>
>
> > +TEST_NO_CREATE_REPO=1
> > +. ./test-lib.sh
> > +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
> > +
> > +# Test parallel-checkout with a branch switch containing file creations,
> > +# deletions, and modification; with different entry types. Switching from B1 to
> > +# B2 will have the following changes:
> > +#
> > +# - a (file):      modified
> > +# - e/x (file):    deleted
> > +# - b (symlink):   deleted
> > +# - b/f (file):    created
> > +# - e (symlink):   created
> > +# - d (submodule): created
> > +#
>
> An interesting set of changes. What about a directory/file conflict?
>
> Something like this might be useful:
>
>  # - f/t (file):   deleted
>  # - f (file):     created
>
> in fact, it could be interesting to have a file conflict with each
> of these types, such as the symlink 'e' and the submodule 'd'.

Sure, I'll add those. (But we already have the symlink case with 'e/x'
(file) and 'e' (symlink), no?)

> While we are at it, what about a symlink/submodule conflict? I know
> it makes the test bigger, but doing everything simultaneously through
> a carefully designed repository helps prevent test case bloat.
>
> > +test_expect_success SYMLINKS 'setup repo for checkout with various types of changes' '
>
> Could we split this setup step into two parts? Once could
> set up everything except the symlinks and would not require
> the SYMLINKS prereq. We could then have another test with
> the SYMLINKS prereq that extends B1 and B2 to have symlinks
> and their conflicts. The remaining tests would work on any
> platform without needing the SYMLINKS prereq.

Good point. I think we might be able to create the symlinks with
`test_ln_s_add` and completely get rid of the SYMLINKS prereq :)

> > +test_expect_success SYMLINKS 'sequential checkout' '
> > +     cp -R various various_sequential &&
> > +     set_checkout_config 1 0 &&
> > +     test_checkout_workers 0 \
> > +             git -C various_sequential checkout --recurse-submodules B2 &&
> > +     verify_checkout various_sequential
> > +'
>
> I see all these tests are very similar. Perhaps group
> them to demonstrate their differences?

Makes sense, I'll do that.

> parallel_test_case () {
>         test_expect_success "$1" "
>                 cp -R various $2 &&
>                 set_checkout_config $3 $4 &&
>                 test_checkout_workers $5 \
>                         git -C $2 checkout --recurse-submodules B2 &&
>                 verify_checkout $2
>         "
> }
>
> parallel_test_case 'sequential checkout' \
>         various_sequential 1 0 0
> parallel_test_case 'parallel checkout' \
>         various_parallel 2 0 2
> parallel_test_case 'fallback to sequential checkout (threshold)' \
>         various_sequential_fallback 2 100 0
>
> > +test_expect_success SYMLINKS 'parallel checkout on clone' '
> > +     git -C various checkout --recurse-submodules B2 &&
> > +     set_checkout_config 2 0 &&
> > +     test_checkout_workers 2 \
> > +             git clone --recurse-submodules various various_parallel_clone &&
> > +     verify_checkout various_parallel_clone
> > +'

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 0/8] Parallel Checkout (part 3)
  2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
                   ` (6 preceding siblings ...)
  2021-04-22 15:17 ` [PATCH 7/7] ci: run test round with parallel-checkout enabled Matheus Tavares
@ 2021-04-30 21:40 ` Matheus Tavares
  2021-04-30 21:40   ` [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
                     ` (9 more replies)
  7 siblings, 10 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

This is the last part of the parallel checkout series. It's based on
mt/parallel-checkout-part-2. This part adds parallel checkout support to
"git checkout-index" and "git checkout <pathspec>", and adds tests for
the parallel checkout framework.

Changes since v2:

Patch 2:
- Mentioned in the commit message which checkout mode the patch is
  adding parallel checkout support for (i.e. pathspec-limited mode).

Patch 4:

  t/lib-parallel-checkout.sh:
  - Added/improved usage message for the helper functions.
  - Renamed trace file in test_checkout_workers to avoid collisions.
  - At verify_checkout:

      * Used --ignore-submodules=none for the diff-index execution. (I
	hadn't noticed before that the default behavior changed to
	'untracked' on git v2.31.0.)

      * Removed the `diff-index --cached` check as it was redundant.
	We are already implicitly checking the index by asserting that
	`git status` is empty after the `git diff-index HEAD` check.

  t/t2080:
  - Used `test_ln_s_add`, to avoid requiring the SYMLINKS prereq in all
    tests but one.
  - Grouped the basic checkout/clone tests in a for loop to make make it
    easier to see what are the differences between then (i.e. the
    different cases they are testing).
  - Added more entry type changes in the branch switch used to check
    parallel checkout correctness.
  - Added a test to ensure submodules can also use parallel checkout.
  - Compare the working trees with `git diff --no-index` instead of
    `diff -r` to avoid following symlinks.

Patch 6:
- Split patch into two: one moving the helper functions from t0028 to
  lib-encoding.sh, and another actually adding the t2082 tests.

Matheus Tavares (8):
  make_transient_cache_entry(): optionally alloc from mem_pool
  builtin/checkout.c: complete parallel checkout support
  checkout-index: add parallel checkout support
  parallel-checkout: add tests for basic operations
  parallel-checkout: add tests related to path collisions
  t0028: extract encoding helpers to lib-encoding.sh
  parallel-checkout: add tests related to .gitattributes
  ci: run test round with parallel-checkout enabled

 builtin/checkout--worker.c              |   2 +-
 builtin/checkout-index.c                |  24 ++-
 builtin/checkout.c                      |  20 ++-
 builtin/difftool.c                      |   2 +-
 cache.h                                 |  11 +-
 ci/run-build-and-tests.sh               |   1 +
 parallel-checkout.c                     |  18 ++
 read-cache.c                            |  12 +-
 t/README                                |   4 +
 t/lib-encoding.sh                       |  25 +++
 t/lib-parallel-checkout.sh              |  45 +++++
 t/t0028-working-tree-encoding.sh        |  25 +--
 t/t2080-parallel-checkout-basics.sh     | 229 ++++++++++++++++++++++++
 t/t2081-parallel-checkout-collisions.sh | 162 +++++++++++++++++
 t/t2082-parallel-checkout-attributes.sh | 194 ++++++++++++++++++++
 unpack-trees.c                          |   2 +-
 16 files changed, 727 insertions(+), 49 deletions(-)
 create mode 100644 t/lib-encoding.sh
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh
 create mode 100755 t/t2081-parallel-checkout-collisions.sh
 create mode 100755 t/t2082-parallel-checkout-attributes.sh

Range-diff against v1:
1:  f870040bfb = 1:  f870040bfb make_transient_cache_entry(): optionally alloc from mem_pool
2:  5e0dee7beb ! 2:  e2d82c4337 builtin/checkout.c: complete parallel checkout support
    @@ Metadata
      ## Commit message ##
         builtin/checkout.c: complete parallel checkout support
     
    -    There is one code path in builtin/checkout.c which still doesn't benefit
    -    from parallel checkout because it calls checkout_entry() directly,
    -    instead of unpack_trees(). Let's add parallel checkout support for this
    -    missing spot as well. Note: the transient cache entries allocated in
    -    checkout_merged() are now allocated in a mem_pool which is only
    -    discarded after parallel checkout finishes. This is done because the
    -    entries need to be valid when run_parallel_checkout() is called.
    +    Pathspec-limited checkouts (like `git checkout *.txt`) are performed by
    +    a code path that doesn't yet support parallel checkout because it calls
    +    checkout_entry() directly, instead of unpack_trees(). Let's add parallel
    +    checkout support for this code path too.
    +
    +    Note: the transient cache entries allocated in checkout_merged() are now
    +    allocated in a mem_pool which is only discarded after parallel checkout
    +    finishes. This is done because the entries need to be valid when
    +    run_parallel_checkout() is called.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
3:  6bf1db2fd8 = 3:  0fe1a5fabc checkout-index: add parallel checkout support
4:  46128ba8d8 ! 4:  f604c50dba parallel-checkout: add tests for basic operations
    @@ Commit message
         included by t2080 for now. But they will also be used by other
         parallel-checkout tests in the following patches.
     
    -    Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
    -    Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
    +    Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## t/lib-parallel-checkout.sh (new) ##
     @@
    -+# Helpers for t208* tests
    ++# Helpers for tests invoking parallel-checkout
     +
     +set_checkout_config () {
     +	if test $# -ne 2
     +	then
    -+		BUG "set_checkout_config() requires two arguments"
    ++		BUG "usage: set_checkout_config <workers> <threshold>"
     +	fi &&
     +
     +	test_config_global checkout.workers $1 &&
    @@ t/lib-parallel-checkout.sh (new)
     +test_checkout_workers () {
     +	if test $# -lt 2
     +	then
    -+		BUG "too few arguments to test_checkout_workers()"
    ++		BUG "too few arguments to test_checkout_workers"
     +	fi &&
     +
    -+	expected_workers=$1 &&
    ++	local expected_workers=$1 &&
     +	shift &&
     +
    -+	rm -f trace &&
    -+	GIT_TRACE2="$(pwd)/trace" "$@" &&
    ++	local trace_file=trace-test-checkout-workers &&
    ++	rm -f "$trace_file" &&
    ++	GIT_TRACE2="$(pwd)/$trace_file" "$@" &&
     +
    -+	workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
    ++	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
     +	test $workers -eq $expected_workers &&
    -+	rm -f trace
    ++	rm "$trace_file"
     +}
     +
     +# Verify that both the working tree and the index were created correctly
     +verify_checkout () {
    -+	git -C "$1" diff-index --quiet HEAD -- &&
    -+	git -C "$1" diff-index --quiet --cached HEAD -- &&
    ++	if test $# -ne 1
    ++	then
    ++		BUG "usage: verify_checkout <repository path>"
    ++	fi &&
    ++
    ++	git -C "$1" diff-index --ignore-submodules=none --exit-code HEAD -- &&
     +	git -C "$1" status --porcelain >"$1".status &&
     +	test_must_be_empty "$1".status
     +}
    @@ t/t2080-parallel-checkout-basics.sh (new)
     +. ./test-lib.sh
     +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
     +
    -+# Test parallel-checkout with a branch switch containing file creations,
    -+# deletions, and modification; with different entry types. Switching from B1 to
    -+# B2 will have the following changes:
    ++# Test parallel-checkout with a branch switch containing a variety of file
    ++# creations, deletions, and modifications, involving different entry types.
    ++# The branches B1 and B2 have the following paths:
    ++#
    ++#      B1                 B2
    ++#  a/a (file)         a   (file)
    ++#  b   (file)         b/b (file)
    ++#
    ++#  c/c (file)         c   (symlink)
    ++#  d   (symlink)      d/d (file)
    ++#
    ++#  e/e (file)         e   (submodule)
    ++#  f   (submodule)    f/f (file)
    ++#
    ++#  g   (submodule)    g   (symlink)
    ++#  h   (symlink)      h   (submodule)
    ++#
    ++# Additionally, the following paths are present on both branches, but with
    ++# different contents:
    ++#
    ++#  i   (file)         i   (file)
    ++#  j   (symlink)      j   (symlink)
    ++#  k   (submodule)    k   (submodule)
     +#
    -+# - a (file):      modified
    -+# - e/x (file):    deleted
    -+# - b (symlink):   deleted
    -+# - b/f (file):    created
    -+# - e (symlink):   created
    -+# - d (submodule): created
    ++# And the following paths are only present in one of the branches:
     +#
    -+test_expect_success SYMLINKS 'setup repo for checkout with various types of changes' '
    ++#  l/l (file)         -
    ++#  -                  m/m (file)
    ++#
    ++test_expect_success 'setup repo for checkout with various types of changes' '
    ++	git init sub &&
    ++	(
    ++		cd sub &&
    ++		git checkout -b B2 &&
    ++		echo B2 >file &&
    ++		git add file &&
    ++		git commit -m file &&
    ++
    ++		git checkout -b B1 &&
    ++		echo B1 >file &&
    ++		git add file &&
    ++		git commit -m file
    ++	) &&
    ++
     +	git init various &&
     +	(
     +		cd various &&
    ++
     +		git checkout -b B1 &&
    -+		echo a >a &&
    -+		mkdir e &&
    -+		echo e/x >e/x &&
    -+		ln -s e b &&
    -+		git add -A &&
    ++		mkdir a c e &&
    ++		echo a/a >a/a &&
    ++		echo b >b &&
    ++		echo c/c >c/c &&
    ++		test_ln_s_add c d &&
    ++		echo e/e >e/e &&
    ++		git submodule add ../sub f &&
    ++		git submodule add ../sub g &&
    ++		test_ln_s_add c h &&
    ++
    ++		echo "B1 i" >i &&
    ++		test_ln_s_add c j &&
    ++		git submodule add -b B1 ../sub k &&
    ++		mkdir l &&
    ++		echo l/l >l/l &&
    ++
    ++		git add . &&
     +		git commit -m B1 &&
     +
     +		git checkout -b B2 &&
    -+		echo modified >a &&
    -+		rm -rf e &&
    -+		rm b &&
    -+		mkdir b &&
    -+		echo b/f >b/f &&
    -+		ln -s b e &&
    -+		git init d &&
    -+		test_commit -C d f &&
    -+		git submodule add ./d &&
    -+		git add -A &&
    ++		git rm -rf :^.gitmodules :^k &&
    ++		mkdir b d f &&
    ++		echo a >a &&
    ++		echo b/b >b/b &&
    ++		test_ln_s_add b c &&
    ++		echo d/d >d/d &&
    ++		git submodule add ../sub e &&
    ++		echo f/f >f/f &&
    ++		test_ln_s_add b g &&
    ++		git submodule add ../sub h &&
    ++
    ++		echo "B2 i" >i &&
    ++		test_ln_s_add b j &&
    ++		git -C k checkout B2 &&
    ++		mkdir m &&
    ++		echo m/m >m/m &&
    ++
    ++		git add . &&
     +		git commit -m B2 &&
     +
     +		git checkout --recurse-submodules B1
     +	)
     +'
     +
    -+test_expect_success SYMLINKS 'sequential checkout' '
    -+	cp -R various various_sequential &&
    -+	set_checkout_config 1 0 &&
    -+	test_checkout_workers 0 \
    -+		git -C various_sequential checkout --recurse-submodules B2 &&
    -+	verify_checkout various_sequential
    -+'
    ++for mode in sequential parallel sequential-fallback
    ++do
    ++	case $mode in
    ++	sequential)          workers=1 threshold=0 expected_workers=0 ;;
    ++	parallel)            workers=2 threshold=0 expected_workers=2 ;;
    ++	sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
    ++	esac
     +
    -+test_expect_success SYMLINKS 'parallel checkout' '
    -+	cp -R various various_parallel &&
    -+	set_checkout_config 2 0 &&
    -+	test_checkout_workers 2 \
    -+		git -C various_parallel checkout --recurse-submodules B2 &&
    -+	verify_checkout various_parallel
    -+'
    ++	test_expect_success "$mode checkout" '
    ++		repo=various_$mode &&
    ++		cp -R various $repo &&
     +
    -+test_expect_success SYMLINKS 'fallback to sequential checkout (threshold)' '
    -+	cp -R various various_sequential_fallback &&
    -+	set_checkout_config 2 100 &&
    -+	test_checkout_workers 0 \
    -+		git -C various_sequential_fallback checkout --recurse-submodules B2 &&
    -+	verify_checkout various_sequential_fallback
    -+'
    ++		# The just copied files have more recent timestamps than their
    ++		# associated index entries. So refresh the cached timestamps
    ++		# to avoid an "entry not up-to-date" error from `git checkout`.
    ++		# We only have to do this for the submodules as `git checkout`
    ++		# will already refresh the superproject index before performing
    ++		# the up-to-date check.
    ++		#
    ++		git -C $repo submodule foreach "git update-index --refresh" &&
     +
    -+test_expect_success SYMLINKS 'parallel checkout on clone' '
    -+	git -C various checkout --recurse-submodules B2 &&
    -+	set_checkout_config 2 0 &&
    -+	test_checkout_workers 2 \
    -+		git clone --recurse-submodules various various_parallel_clone &&
    -+	verify_checkout various_parallel_clone
    -+'
    ++		set_checkout_config $workers $threshold &&
    ++		test_checkout_workers $expected_workers \
    ++			git -C $repo checkout --recurse-submodules B2 &&
    ++		verify_checkout $repo
    ++	'
    ++done
     +
    -+test_expect_success SYMLINKS 'fallback to sequential checkout on clone (threshold)' '
    -+	git -C various checkout --recurse-submodules B2 &&
    -+	set_checkout_config 2 100 &&
    -+	test_checkout_workers 0 \
    -+		git clone --recurse-submodules various various_sequential_fallback_clone &&
    -+	verify_checkout various_sequential_fallback_clone
    -+'
    ++for mode in parallel sequential-fallback
    ++do
    ++	case $mode in
    ++	parallel)            workers=2 threshold=0 expected_workers=2 ;;
    ++	sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
    ++	esac
    ++
    ++	test_expect_success "$mode checkout on clone" '
    ++		repo=various_${mode}_clone &&
    ++		set_checkout_config $workers $threshold &&
    ++		test_checkout_workers $expected_workers \
    ++			git clone --recurse-submodules --branch B2 various $repo &&
    ++		verify_checkout $repo
    ++	'
    ++done
     +
     +# Just to be paranoid, actually compare the working trees' contents directly.
    -+test_expect_success SYMLINKS 'compare the working trees' '
    ++test_expect_success 'compare the working trees' '
     +	rm -rf various_*/.git &&
    -+	rm -rf various_*/d/.git &&
    ++	rm -rf various_*/*/.git &&
    ++
    ++	# We use `git diff` instead of `diff -r` because the latter would
    ++	# follow symlinks, and not all `diff` implementations support the
    ++	# `--no-dereference` option.
    ++	#
    ++	git diff --no-index various_sequential various_parallel &&
    ++	git diff --no-index various_sequential various_parallel_clone &&
    ++	git diff --no-index various_sequential various_sequential-fallback &&
    ++	git diff --no-index various_sequential various_sequential-fallback_clone
    ++'
     +
    -+	diff -r various_sequential various_parallel &&
    -+	diff -r various_sequential various_sequential_fallback &&
    -+	diff -r various_sequential various_parallel_clone &&
    -+	diff -r various_sequential various_sequential_fallback_clone
    ++# Currently, each submodule is checked out in a separated child process, but
    ++# these subprocesses must also be able to use parallel checkout workers to
    ++# write the submodules' entries.
    ++test_expect_success 'submodules can use parallel checkout' '
    ++	set_checkout_config 2 0 &&
    ++	git init super &&
    ++	(
    ++		cd super &&
    ++		git init sub &&
    ++		test_commit -C sub A &&
    ++		test_commit -C sub B &&
    ++		git submodule add ./sub &&
    ++		git commit -m sub &&
    ++		rm sub/* &&
    ++		test_checkout_workers 2 git checkout --recurse-submodules .
    ++	)
     +'
     +
     +test_expect_success 'parallel checkout respects --[no]-force' '
5:  71be08b895 ! 5:  eae6d3a1c1 parallel-checkout: add tests related to path collisions
    @@ Commit message
         checkout workers, both to avoid race conditions and to report colliding
         entries on clone.
     
    -    Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
    -    Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
    +    Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## parallel-checkout.c ##
    @@ parallel-checkout.c: void write_pc_item(struct parallel_checkout_item *pc_item,
     
      ## t/lib-parallel-checkout.sh ##
     @@ t/lib-parallel-checkout.sh: test_checkout_workers () {
    - 	shift &&
      
    - 	rm -f trace &&
    --	GIT_TRACE2="$(pwd)/trace" "$@" &&
    -+	GIT_TRACE2="$(pwd)/trace" "$@" 2>&8 &&
    + 	local trace_file=trace-test-checkout-workers &&
    + 	rm -f "$trace_file" &&
    +-	GIT_TRACE2="$(pwd)/$trace_file" "$@" &&
    ++	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
      
    - 	workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
    + 	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
      	test $workers -eq $expected_workers &&
    - 	rm -f trace
    + 	rm "$trace_file"
     -}
     +} 8>&2 2>&4
      
-:  ---------- > 6:  9161cd1503 t0028: extract encoding helpers to lib-encoding.sh
6:  1dedfd994a ! 7:  bc584897e6 parallel-checkout: add tests related to .gitattributes
    @@ Commit message
         require external filters, are correctly smudge and written when
         parallel-checkout is enabled.
     
    -    Note: to avoid repeating code, some helper functions are extracted from
    -    t0028 into a common lib file.
    -
    -    Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
    -    Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
    +    Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
    - ## t/lib-encoding.sh (new) ##
    -@@
    -+# Encoding helpers used by t0028 and t2082
    -+
    -+test_lazy_prereq NO_UTF16_BOM '
    -+	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
    -+'
    -+
    -+test_lazy_prereq NO_UTF32_BOM '
    -+	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
    -+'
    -+
    -+write_utf16 () {
    -+	if test_have_prereq NO_UTF16_BOM
    -+	then
    -+		printf '\376\377'
    -+	fi &&
    -+	iconv -f UTF-8 -t UTF-16
    -+}
    -+
    -+write_utf32 () {
    -+	if test_have_prereq NO_UTF32_BOM
    -+	then
    -+		printf '\0\0\376\377'
    -+	fi &&
    -+	iconv -f UTF-8 -t UTF-32
    -+}
    -
    - ## t/t0028-working-tree-encoding.sh ##
    -@@ t/t0028-working-tree-encoding.sh: GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    - export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    - 
    - . ./test-lib.sh
    -+. "$TEST_DIRECTORY/lib-encoding.sh"
    - 
    - GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
    - 
    --test_lazy_prereq NO_UTF16_BOM '
    --	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
    --'
    --
    --test_lazy_prereq NO_UTF32_BOM '
    --	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
    --'
    --
    --write_utf16 () {
    --	if test_have_prereq NO_UTF16_BOM
    --	then
    --		printf '\376\377'
    --	fi &&
    --	iconv -f UTF-8 -t UTF-16
    --}
    --
    --write_utf32 () {
    --	if test_have_prereq NO_UTF32_BOM
    --	then
    --		printf '\0\0\376\377'
    --	fi &&
    --	iconv -f UTF-8 -t UTF-32
    --}
    --
    - test_expect_success 'setup test files' '
    - 	git config core.eol lf &&
    - 
    -
      ## t/t2082-parallel-checkout-attributes.sh (new) ##
     @@
     +#!/bin/sh
7:  3cc5b10081 ! 8:  1bc5c523c5 ci: run test round with parallel-checkout enabled
    @@ t/README: and "sha256".
     
      ## t/lib-parallel-checkout.sh ##
     @@
    - # Helpers for t208* tests
    + # Helpers for tests invoking parallel-checkout
      
     +# Parallel checkout tests need full control of the number of workers
     +unset GIT_TEST_CHECKOUT_WORKERS
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-05-01 17:06     ` Christian Couder
  2021-04-30 21:40   ` [PATCH v2 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout--worker.c |  2 +-
 builtin/checkout.c         |  2 +-
 builtin/difftool.c         |  2 +-
 cache.h                    | 11 ++++++-----
 read-cache.c               | 12 ++++++++----
 unpack-trees.c             |  2 +-
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/checkout--worker.c b/builtin/checkout--worker.c
index 31e0de2f7e..289a9b8f89 100644
--- a/builtin/checkout--worker.c
+++ b/builtin/checkout--worker.c
@@ -39,7 +39,7 @@ static void packet_to_pc_item(const char *buffer, int len,
 	}
 
 	memset(pc_item, 0, sizeof(*pc_item));
-	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len, NULL);
 	pc_item->ce->ce_namelen = fixed_portion->name_len;
 	pc_item->ce->ce_mode = fixed_portion->ce_mode;
 	memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 4c696ef480..db667d0267 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -291,7 +291,7 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
 	if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
 		die(_("Unable to add merge result for '%s'"), path);
 	free(result_buf.ptr);
-	ce = make_transient_cache_entry(mode, &oid, path, 2);
+	ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
 	if (!ce)
 		die(_("make_cache_entry failed for path '%s'"), path);
 	status = checkout_entry(ce, state, NULL, nr_checkouts);
diff --git a/builtin/difftool.c b/builtin/difftool.c
index ef25729d49..afacbcd581 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -323,7 +323,7 @@ static int checkout_path(unsigned mode, struct object_id *oid,
 	struct cache_entry *ce;
 	int ret;
 
-	ce = make_transient_cache_entry(mode, oid, path, 0);
+	ce = make_transient_cache_entry(mode, oid, path, 0, NULL);
 	ret = checkout_entry(ce, state, NULL, NULL);
 
 	discard_cache_entry(ce);
diff --git a/cache.h b/cache.h
index 148d9ab5f1..b6b42cc3f3 100644
--- a/cache.h
+++ b/cache.h
@@ -356,16 +356,17 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate,
 					   size_t name_len);
 
 /*
- * Create a cache_entry that is not intended to be added to an index.
- * Caller is responsible for discarding the cache_entry
- * with `discard_cache_entry`.
+ * Create a cache_entry that is not intended to be added to an index. If `mp`
+ * is not NULL, the entry is allocated within the given memory pool. Caller is
+ * responsible for discarding "loose" entries with `discard_cache_entry()` and
+ * the mem_pool with `mem_pool_discard(mp, should_validate_cache_entries())`.
  */
 struct cache_entry *make_transient_cache_entry(unsigned int mode,
 					       const struct object_id *oid,
 					       const char *path,
-					       int stage);
+					       int stage, struct mem_pool *mp);
 
-struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp);
 
 /*
  * Discard cache entry.
diff --git a/read-cache.c b/read-cache.c
index 5a907af2fb..eb389ccb8f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -813,8 +813,10 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate, size_t le
 	return mem_pool__ce_calloc(find_mem_pool(istate), len);
 }
 
-struct cache_entry *make_empty_transient_cache_entry(size_t len)
+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp)
 {
+	if (mp)
+		return mem_pool__ce_calloc(mp, len);
 	return xcalloc(1, cache_entry_size(len));
 }
 
@@ -848,8 +850,10 @@ struct cache_entry *make_cache_entry(struct index_state *istate,
 	return ret;
 }
 
-struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid,
-					       const char *path, int stage)
+struct cache_entry *make_transient_cache_entry(unsigned int mode,
+					       const struct object_id *oid,
+					       const char *path, int stage,
+					       struct mem_pool *mp)
 {
 	struct cache_entry *ce;
 	int len;
@@ -860,7 +864,7 @@ struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct o
 	}
 
 	len = strlen(path);
-	ce = make_empty_transient_cache_entry(len);
+	ce = make_empty_transient_cache_entry(len, mp);
 
 	oidcpy(&ce->oid, oid);
 	memcpy(ce->name, path, len);
diff --git a/unpack-trees.c b/unpack-trees.c
index 4b77e52c6b..fa5b7ab7ee 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1034,7 +1034,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	size_t len = traverse_path_len(info, tree_entry_len(n));
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len) :
+		make_empty_transient_cache_entry(len, NULL) :
 		make_empty_cache_entry(istate, len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 2/8] builtin/checkout.c: complete parallel checkout support
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
  2021-04-30 21:40   ` [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-05-01 17:08     ` Christian Couder
  2021-04-30 21:40   ` [PATCH v2 3/8] checkout-index: add " Matheus Tavares
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

Pathspec-limited checkouts (like `git checkout *.txt`) are performed by
a code path that doesn't yet support parallel checkout because it calls
checkout_entry() directly, instead of unpack_trees(). Let's add parallel
checkout support for this code path too.

Note: the transient cache entries allocated in checkout_merged() are now
allocated in a mem_pool which is only discarded after parallel checkout
finishes. This is done because the entries need to be valid when
run_parallel_checkout() is called.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index db667d0267..b71dc08430 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -27,6 +27,7 @@
 #include "wt-status.h"
 #include "xdiff-interface.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 static const char * const checkout_usage[] = {
 	N_("git checkout [<options>] <branch>"),
@@ -230,7 +231,8 @@ static int checkout_stage(int stage, const struct cache_entry *ce, int pos,
 		return error(_("path '%s' does not have their version"), ce->name);
 }
 
-static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts)
+static int checkout_merged(int pos, const struct checkout *state,
+			   int *nr_checkouts, struct mem_pool *ce_mem_pool)
 {
 	struct cache_entry *ce = active_cache[pos];
 	const char *path = ce->name;
@@ -291,11 +293,10 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
 	if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
 		die(_("Unable to add merge result for '%s'"), path);
 	free(result_buf.ptr);
-	ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
+	ce = make_transient_cache_entry(mode, &oid, path, 2, ce_mem_pool);
 	if (!ce)
 		die(_("make_cache_entry failed for path '%s'"), path);
 	status = checkout_entry(ce, state, NULL, nr_checkouts);
-	discard_cache_entry(ce);
 	return status;
 }
 
@@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
 	int nr_checkouts = 0, nr_unmerged = 0;
 	int errs = 0;
 	int pos;
+	int pc_workers, pc_threshold;
+	struct mem_pool ce_mem_pool;
 
 	state.force = 1;
 	state.refresh_cache = 1;
 	state.istate = &the_index;
 
+	mem_pool_init(&ce_mem_pool, 0);
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
 	init_checkout_metadata(&state.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : &info->oid,
 			       NULL);
 
 	enable_delayed_checkout(&state);
+	if (pc_workers > 1)
+		init_parallel_checkout();
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
@@ -384,10 +391,15 @@ static int checkout_worktree(const struct checkout_opts *opts,
 						       &nr_checkouts, opts->overlay_mode);
 			else if (opts->merge)
 				errs |= checkout_merged(pos, &state,
-							&nr_unmerged);
+							&nr_unmerged,
+							&ce_mem_pool);
 			pos = skip_same_name(ce, pos) - 1;
 		}
 	}
+	if (pc_workers > 1)
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					      NULL, NULL);
+	mem_pool_discard(&ce_mem_pool, should_validate_cache_entries());
 	remove_marked_cache_entries(&the_index, 1);
 	remove_scheduled_dirs();
 	errs |= finish_delayed_checkout(&state, &nr_checkouts);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 3/8] checkout-index: add parallel checkout support
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
  2021-04-30 21:40   ` [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
  2021-04-30 21:40   ` [PATCH v2 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-05-01 17:08     ` Christian Couder
  2021-04-30 21:40   ` [PATCH v2 4/8] parallel-checkout: add tests for basic operations Matheus Tavares
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

Note: previously, `checkout_all()` would not return on errors, but
instead call `exit()` with a non-zero code. However, it only did that
after calling `checkout_entry()` for all index entries, thus not
stopping on the first error, but attempting to write the maximum number
of entries possible. In order to maintain this behavior we now propagate
`checkout_all()`s error status to `cmd_checkout_index()`, so that it can
call `run_parallel_checkout()` and attempt to write the queued entries
before exiting with the error code.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout-index.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index c0bf4ac1b2..e8a82ea9ed 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -12,6 +12,7 @@
 #include "cache-tree.h"
 #include "parse-options.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 #define CHECKOUT_ALL 4
 static int nul_term_line;
@@ -115,7 +116,7 @@ static int checkout_file(const char *name, const char *prefix)
 	return -1;
 }
 
-static void checkout_all(const char *prefix, int prefix_length)
+static int checkout_all(const char *prefix, int prefix_length)
 {
 	int i, errs = 0;
 	struct cache_entry *last_ce = NULL;
@@ -142,11 +143,7 @@ static void checkout_all(const char *prefix, int prefix_length)
 	}
 	if (last_ce && to_tempfile)
 		write_tempfile_record(last_ce->name, prefix);
-	if (errs)
-		/* we have already done our error reporting.
-		 * exit with the same code as die().
-		 */
-		exit(128);
+	return !!errs;
 }
 
 static const char * const builtin_checkout_index_usage[] = {
@@ -182,6 +179,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 	int force = 0, quiet = 0, not_new = 0;
 	int index_opt = 0;
 	int err = 0;
+	int pc_workers, pc_threshold;
 	struct option builtin_checkout_index_options[] = {
 		OPT_BOOL('a', "all", &all,
 			N_("check out all files in the index")),
@@ -236,6 +234,10 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 	}
 
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+	if (pc_workers > 1)
+		init_parallel_checkout();
+
 	/* Check out named files first */
 	for (i = 0; i < argc; i++) {
 		const char *arg = argv[i];
@@ -275,12 +277,16 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	if (all)
+		err |= checkout_all(prefix, prefix_length);
+
+	if (pc_workers > 1)
+		err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					     NULL, NULL);
+
 	if (err)
 		return 1;
 
-	if (all)
-		checkout_all(prefix, prefix_length);
-
 	if (is_lock_file_locked(&lock_file) &&
 	    write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die("Unable to write new index file");
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 4/8] parallel-checkout: add tests for basic operations
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
                     ` (2 preceding siblings ...)
  2021-04-30 21:40   ` [PATCH v2 3/8] checkout-index: add " Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-04-30 21:40   ` [PATCH v2 5/8] parallel-checkout: add tests related to path collisions Matheus Tavares
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

Add tests to populate the working tree during clone and checkout using
sequential and parallel mode, to confirm that they produce identical
results. Also test basic checkout mechanics, such as checking for
symlinks in the leading directories and the abidance to --force.

Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But they will also be used by other
parallel-checkout tests in the following patches.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-parallel-checkout.sh          |  42 +++++
 t/t2080-parallel-checkout-basics.sh | 229 ++++++++++++++++++++++++++++
 2 files changed, 271 insertions(+)
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh

diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
new file mode 100644
index 0000000000..f60b22ef34
--- /dev/null
+++ b/t/lib-parallel-checkout.sh
@@ -0,0 +1,42 @@
+# Helpers for tests invoking parallel-checkout
+
+set_checkout_config () {
+	if test $# -ne 2
+	then
+		BUG "usage: set_checkout_config <workers> <threshold>"
+	fi &&
+
+	test_config_global checkout.workers $1 &&
+	test_config_global checkout.thresholdForParallelism $2
+}
+
+# Run "${@:2}" and check that $1 checkout workers were used
+test_checkout_workers () {
+	if test $# -lt 2
+	then
+		BUG "too few arguments to test_checkout_workers"
+	fi &&
+
+	local expected_workers=$1 &&
+	shift &&
+
+	local trace_file=trace-test-checkout-workers &&
+	rm -f "$trace_file" &&
+	GIT_TRACE2="$(pwd)/$trace_file" "$@" &&
+
+	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
+	test $workers -eq $expected_workers &&
+	rm "$trace_file"
+}
+
+# Verify that both the working tree and the index were created correctly
+verify_checkout () {
+	if test $# -ne 1
+	then
+		BUG "usage: verify_checkout <repository path>"
+	fi &&
+
+	git -C "$1" diff-index --ignore-submodules=none --exit-code HEAD -- &&
+	git -C "$1" status --porcelain >"$1".status &&
+	test_must_be_empty "$1".status
+}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
new file mode 100755
index 0000000000..7087818550
--- /dev/null
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -0,0 +1,229 @@
+#!/bin/sh
+
+test_description='parallel-checkout basics
+
+Ensure that parallel-checkout basically works on clone and checkout, spawning
+the required number of workers and correctly populating both the index and the
+working tree.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+# Test parallel-checkout with a branch switch containing a variety of file
+# creations, deletions, and modifications, involving different entry types.
+# The branches B1 and B2 have the following paths:
+#
+#      B1                 B2
+#  a/a (file)         a   (file)
+#  b   (file)         b/b (file)
+#
+#  c/c (file)         c   (symlink)
+#  d   (symlink)      d/d (file)
+#
+#  e/e (file)         e   (submodule)
+#  f   (submodule)    f/f (file)
+#
+#  g   (submodule)    g   (symlink)
+#  h   (symlink)      h   (submodule)
+#
+# Additionally, the following paths are present on both branches, but with
+# different contents:
+#
+#  i   (file)         i   (file)
+#  j   (symlink)      j   (symlink)
+#  k   (submodule)    k   (submodule)
+#
+# And the following paths are only present in one of the branches:
+#
+#  l/l (file)         -
+#  -                  m/m (file)
+#
+test_expect_success 'setup repo for checkout with various types of changes' '
+	git init sub &&
+	(
+		cd sub &&
+		git checkout -b B2 &&
+		echo B2 >file &&
+		git add file &&
+		git commit -m file &&
+
+		git checkout -b B1 &&
+		echo B1 >file &&
+		git add file &&
+		git commit -m file
+	) &&
+
+	git init various &&
+	(
+		cd various &&
+
+		git checkout -b B1 &&
+		mkdir a c e &&
+		echo a/a >a/a &&
+		echo b >b &&
+		echo c/c >c/c &&
+		test_ln_s_add c d &&
+		echo e/e >e/e &&
+		git submodule add ../sub f &&
+		git submodule add ../sub g &&
+		test_ln_s_add c h &&
+
+		echo "B1 i" >i &&
+		test_ln_s_add c j &&
+		git submodule add -b B1 ../sub k &&
+		mkdir l &&
+		echo l/l >l/l &&
+
+		git add . &&
+		git commit -m B1 &&
+
+		git checkout -b B2 &&
+		git rm -rf :^.gitmodules :^k &&
+		mkdir b d f &&
+		echo a >a &&
+		echo b/b >b/b &&
+		test_ln_s_add b c &&
+		echo d/d >d/d &&
+		git submodule add ../sub e &&
+		echo f/f >f/f &&
+		test_ln_s_add b g &&
+		git submodule add ../sub h &&
+
+		echo "B2 i" >i &&
+		test_ln_s_add b j &&
+		git -C k checkout B2 &&
+		mkdir m &&
+		echo m/m >m/m &&
+
+		git add . &&
+		git commit -m B2 &&
+
+		git checkout --recurse-submodules B1
+	)
+'
+
+for mode in sequential parallel sequential-fallback
+do
+	case $mode in
+	sequential)          workers=1 threshold=0 expected_workers=0 ;;
+	parallel)            workers=2 threshold=0 expected_workers=2 ;;
+	sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
+	esac
+
+	test_expect_success "$mode checkout" '
+		repo=various_$mode &&
+		cp -R various $repo &&
+
+		# The just copied files have more recent timestamps than their
+		# associated index entries. So refresh the cached timestamps
+		# to avoid an "entry not up-to-date" error from `git checkout`.
+		# We only have to do this for the submodules as `git checkout`
+		# will already refresh the superproject index before performing
+		# the up-to-date check.
+		#
+		git -C $repo submodule foreach "git update-index --refresh" &&
+
+		set_checkout_config $workers $threshold &&
+		test_checkout_workers $expected_workers \
+			git -C $repo checkout --recurse-submodules B2 &&
+		verify_checkout $repo
+	'
+done
+
+for mode in parallel sequential-fallback
+do
+	case $mode in
+	parallel)            workers=2 threshold=0 expected_workers=2 ;;
+	sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
+	esac
+
+	test_expect_success "$mode checkout on clone" '
+		repo=various_${mode}_clone &&
+		set_checkout_config $workers $threshold &&
+		test_checkout_workers $expected_workers \
+			git clone --recurse-submodules --branch B2 various $repo &&
+		verify_checkout $repo
+	'
+done
+
+# Just to be paranoid, actually compare the working trees' contents directly.
+test_expect_success 'compare the working trees' '
+	rm -rf various_*/.git &&
+	rm -rf various_*/*/.git &&
+
+	# We use `git diff` instead of `diff -r` because the latter would
+	# follow symlinks, and not all `diff` implementations support the
+	# `--no-dereference` option.
+	#
+	git diff --no-index various_sequential various_parallel &&
+	git diff --no-index various_sequential various_parallel_clone &&
+	git diff --no-index various_sequential various_sequential-fallback &&
+	git diff --no-index various_sequential various_sequential-fallback_clone
+'
+
+# Currently, each submodule is checked out in a separated child process, but
+# these subprocesses must also be able to use parallel checkout workers to
+# write the submodules' entries.
+test_expect_success 'submodules can use parallel checkout' '
+	set_checkout_config 2 0 &&
+	git init super &&
+	(
+		cd super &&
+		git init sub &&
+		test_commit -C sub A &&
+		test_commit -C sub B &&
+		git submodule add ./sub &&
+		git commit -m sub &&
+		rm sub/* &&
+		test_checkout_workers 2 git checkout --recurse-submodules .
+	)
+'
+
+test_expect_success 'parallel checkout respects --[no]-force' '
+	set_checkout_config 2 0 &&
+	git init dirty &&
+	(
+		cd dirty &&
+		mkdir D &&
+		test_commit D/F &&
+		test_commit F &&
+
+		rm -rf D &&
+		echo changed >D &&
+		echo changed >F.t &&
+
+		# We expect 0 workers because there is nothing to be done
+		test_checkout_workers 0 git checkout HEAD &&
+		test_path_is_file D &&
+		grep changed D &&
+		grep changed F.t &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		test_path_is_dir D &&
+		grep D/F D/F.t &&
+		grep F F.t
+	)
+'
+
+test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
+	set_checkout_config 2 0 &&
+	git init symlinks &&
+	(
+		cd symlinks &&
+		mkdir D untracked &&
+		# Commit 2 files to have enough work for 2 parallel workers
+		test_commit D/A &&
+		test_commit D/B &&
+		rm -rf D &&
+		ln -s untracked D &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		! test -h D &&
+		grep D/A D/A.t &&
+		grep D/B D/B.t
+	)
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 5/8] parallel-checkout: add tests related to path collisions
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
                     ` (3 preceding siblings ...)
  2021-04-30 21:40   ` [PATCH v2 4/8] parallel-checkout: add tests for basic operations Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-05-02  7:59     ` Torsten Bögershausen
  2021-04-30 21:40   ` [PATCH v2 6/8] t0028: extract encoding helpers to lib-encoding.sh Matheus Tavares
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

Add tests to confirm that path collisions are properly detected by
checkout workers, both to avoid race conditions and to report colliding
entries on clone.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 parallel-checkout.c                     |   4 +
 t/lib-parallel-checkout.sh              |   4 +-
 t/t2081-parallel-checkout-collisions.sh | 162 ++++++++++++++++++++++++
 3 files changed, 168 insertions(+), 2 deletions(-)
 create mode 100755 t/t2081-parallel-checkout-collisions.sh

diff --git a/parallel-checkout.c b/parallel-checkout.c
index 09e8b10a35..6fb3f1e6c9 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -8,6 +8,7 @@
 #include "sigchain.h"
 #include "streaming.h"
 #include "thread-utils.h"
+#include "trace2.h"
 
 struct pc_worker {
 	struct child_process cp;
@@ -326,6 +327,7 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
 	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
 					   state->base_dir_len)) {
 		pc_item->status = PC_ITEM_COLLIDED;
+		trace2_data_string("pcheckout", NULL, "collision/dirname", path.buf);
 		goto out;
 	}
 
@@ -341,6 +343,8 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
 			 * call should have already caught these cases.
 			 */
 			pc_item->status = PC_ITEM_COLLIDED;
+			trace2_data_string("pcheckout", NULL,
+					   "collision/basename", path.buf);
 		} else {
 			error_errno("failed to open file '%s'", path.buf);
 			pc_item->status = PC_ITEM_FAILED;
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index f60b22ef34..d6740425b1 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -22,12 +22,12 @@ test_checkout_workers () {
 
 	local trace_file=trace-test-checkout-workers &&
 	rm -f "$trace_file" &&
-	GIT_TRACE2="$(pwd)/$trace_file" "$@" &&
+	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
 
 	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
 	test $workers -eq $expected_workers &&
 	rm "$trace_file"
-}
+} 8>&2 2>&4
 
 # Verify that both the working tree and the index were created correctly
 verify_checkout () {
diff --git a/t/t2081-parallel-checkout-collisions.sh b/t/t2081-parallel-checkout-collisions.sh
new file mode 100755
index 0000000000..f6fcfc0c1e
--- /dev/null
+++ b/t/t2081-parallel-checkout-collisions.sh
@@ -0,0 +1,162 @@
+#!/bin/sh
+
+test_description="path collisions during parallel checkout
+
+Parallel checkout must detect path collisions to:
+
+1) Avoid racily writing to different paths that represent the same file on disk.
+2) Report the colliding entries on clone.
+
+The tests in this file exercise parallel checkout's collision detection code in
+both these mechanics.
+"
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+TEST_ROOT="$PWD"
+
+test_expect_success CASE_INSENSITIVE_FS 'setup' '
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+	cat >objs <<-EOF &&
+	100644 $empty_oid	FILE_X
+	100644 $empty_oid	FILE_x
+	100644 $empty_oid	file_X
+	100644 $empty_oid	file_x
+	EOF
+	git update-index --index-info <objs &&
+	git commit -m "colliding files" &&
+	git tag basename_collision &&
+
+	write_script "$TEST_ROOT"/logger_script <<-\EOF
+	echo "$@" >>filter.log
+	EOF
+'
+
+test_workers_in_event_trace ()
+{
+	test $1 -eq $(grep ".event.:.child_start..*checkout--worker" $2 | wc -l)
+}
+
+test_expect_success CASE_INSENSITIVE_FS 'worker detects basename collision' '
+	GIT_TRACE2_EVENT="$(pwd)/trace" git \
+		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
+		checkout . &&
+
+	test_workers_in_event_trace 2 trace &&
+	collisions=$(grep -i "category.:.pcheckout.,.key.:.collision/basename.,.value.:.file_x.}" trace | wc -l) &&
+	test $collisions -eq 3
+'
+
+test_expect_success CASE_INSENSITIVE_FS 'worker detects dirname collision' '
+	test_config filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+
+	# By setting a filter command to "a", we make it ineligible for parallel
+	# checkout, and thus it is checked out *first*. This way we can ensure
+	# that "A/B" and "A/C" will both collide with the regular file "a".
+	#
+	attr_oid=$(echo "a filter=logger" | git hash-object -w --stdin) &&
+
+	cat >objs <<-EOF &&
+	100644 $empty_oid	A/B
+	100644 $empty_oid	A/C
+	100644 $empty_oid	a
+	100644 $attr_oid	.gitattributes
+	EOF
+	git rm -rf . &&
+	git update-index --index-info <objs &&
+
+	rm -f trace filter.log &&
+	GIT_TRACE2_EVENT="$(pwd)/trace" git \
+		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
+		checkout . &&
+
+	# Check that "a" (and only "a") was filtered
+	echo a >expected.log &&
+	test_cmp filter.log expected.log &&
+
+	# Check that it used the right number of workers and detected the collisions
+	test_workers_in_event_trace 2 trace &&
+	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/B.}" trace &&
+	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/C.}" trace
+'
+
+test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'do not follow symlinks colliding with leading dir' '
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+	symlink_oid=$(echo "./e" | git hash-object -w --stdin) &&
+	mkdir e &&
+
+	cat >objs <<-EOF &&
+	120000 $symlink_oid	D
+	100644 $empty_oid	d/x
+	100644 $empty_oid	e/y
+	EOF
+	git rm -rf . &&
+	git update-index --index-info <objs &&
+
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git checkout . &&
+	test_path_is_dir e &&
+	test_path_is_missing e/x
+'
+
+# The two following tests check that parallel checkout correctly reports
+# colliding entries on clone. The sequential code detects a collision by
+# calling lstat() before trying to open(O_CREAT) a file. (Note that this only
+# works for clone.) Then, to find the pair of a colliding item k, it searches
+# cache_entry[0, k-1]. This is not sufficient in parallel checkout because:
+#
+# - A colliding file may be created between the lstat() and open() calls;
+# - A colliding entry might appear in the second half of the cache_entry array.
+#
+test_expect_success CASE_INSENSITIVE_FS 'collision report on clone (w/ racy file creation)' '
+	git reset --hard basename_collision &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git clone . clone-repo 2>stderr &&
+
+	grep FILE_X stderr &&
+	grep FILE_x stderr &&
+	grep file_X stderr &&
+	grep file_x stderr &&
+	grep "the following paths have collided" stderr
+'
+
+# This test ensures that the collision report code is correctly looking for
+# colliding peers in the second half of the cache_entry array. This is done by
+# defining a smudge command for the *last* array entry, which makes it
+# non-eligible for parallel-checkout. Thus, it is checked out *first*, before
+# spawning the workers.
+#
+# Note: this test doesn't work on Windows because, on this system, the
+# collision report code uses strcmp() to find the colliding pairs when
+# core.ignoreCase is false. And we need this setting for this test so that only
+# 'file_x' matches the pattern of the filter attribute. But the test works on
+# OSX, where the colliding pairs are found using inode.
+#
+test_expect_success CASE_INSENSITIVE_FS,!MINGW,!CYGWIN \
+	'collision report on clone (w/ colliding peer after the detected entry)' '
+
+	test_config_global filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
+	git reset --hard basename_collision &&
+	echo "file_x filter=logger" >.gitattributes &&
+	git add .gitattributes &&
+	git commit -m "filter for file_x" &&
+
+	rm -rf clone-repo &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 \
+		git -c core.ignoreCase=false clone . clone-repo 2>stderr &&
+
+	grep FILE_X stderr &&
+	grep FILE_x stderr &&
+	grep file_X stderr &&
+	grep file_x stderr &&
+	grep "the following paths have collided" stderr &&
+
+	# Check that only "file_x" was filtered
+	echo file_x >expected.log &&
+	test_cmp clone-repo/filter.log expected.log
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 6/8] t0028: extract encoding helpers to lib-encoding.sh
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
                     ` (4 preceding siblings ...)
  2021-04-30 21:40   ` [PATCH v2 5/8] parallel-checkout: add tests related to path collisions Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-04-30 21:40   ` [PATCH v2 7/8] parallel-checkout: add tests related to .gitattributes Matheus Tavares
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

The following patch will add tests outside t0028 which will also need to
re-encode some strings. Extract the auxiliary encoding functions from
t0028 to a common lib file so that they can be reused.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-encoding.sh                | 25 +++++++++++++++++++++++++
 t/t0028-working-tree-encoding.sh | 25 +------------------------
 2 files changed, 26 insertions(+), 24 deletions(-)
 create mode 100644 t/lib-encoding.sh

diff --git a/t/lib-encoding.sh b/t/lib-encoding.sh
new file mode 100644
index 0000000000..2dabc8c73e
--- /dev/null
+++ b/t/lib-encoding.sh
@@ -0,0 +1,25 @@
+# Encoding helpers
+
+test_lazy_prereq NO_UTF16_BOM '
+	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
+'
+
+test_lazy_prereq NO_UTF32_BOM '
+	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
+'
+
+write_utf16 () {
+	if test_have_prereq NO_UTF16_BOM
+	then
+		printf '\376\377'
+	fi &&
+	iconv -f UTF-8 -t UTF-16
+}
+
+write_utf32 () {
+	if test_have_prereq NO_UTF32_BOM
+	then
+		printf '\0\0\376\377'
+	fi &&
+	iconv -f UTF-8 -t UTF-32
+}
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index f970a9806b..82905a2156 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -6,33 +6,10 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY/lib-encoding.sh"
 
 GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
 
-test_lazy_prereq NO_UTF16_BOM '
-	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
-'
-
-test_lazy_prereq NO_UTF32_BOM '
-	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
-'
-
-write_utf16 () {
-	if test_have_prereq NO_UTF16_BOM
-	then
-		printf '\376\377'
-	fi &&
-	iconv -f UTF-8 -t UTF-16
-}
-
-write_utf32 () {
-	if test_have_prereq NO_UTF32_BOM
-	then
-		printf '\0\0\376\377'
-	fi &&
-	iconv -f UTF-8 -t UTF-32
-}
-
 test_expect_success 'setup test files' '
 	git config core.eol lf &&
 
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 7/8] parallel-checkout: add tests related to .gitattributes
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
                     ` (5 preceding siblings ...)
  2021-04-30 21:40   ` [PATCH v2 6/8] t0028: extract encoding helpers to lib-encoding.sh Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-04-30 21:40   ` [PATCH v2 8/8] ci: run test round with parallel-checkout enabled Matheus Tavares
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

Add tests to confirm that the `struct conv_attrs` data is correctly
passed from the main process to the workers, and that they can properly
convert the blobs before writing them to the working tree.

Also check that parallel-ineligible entries, such as regular files that
require external filters, are correctly smudge and written when
parallel-checkout is enabled.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t2082-parallel-checkout-attributes.sh | 194 ++++++++++++++++++++++++
 1 file changed, 194 insertions(+)
 create mode 100755 t/t2082-parallel-checkout-attributes.sh

diff --git a/t/t2082-parallel-checkout-attributes.sh b/t/t2082-parallel-checkout-attributes.sh
new file mode 100755
index 0000000000..2525457961
--- /dev/null
+++ b/t/t2082-parallel-checkout-attributes.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+
+test_description='parallel-checkout: attributes
+
+Verify that parallel-checkout correctly creates files that require
+conversions, as specified in .gitattributes. The main point here is
+to check that the conv_attr data is correctly sent to the workers
+and that it contains sufficient information to smudge files
+properly (without access to the index or attribute stack).
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+. "$TEST_DIRECTORY/lib-encoding.sh"
+
+test_expect_success 'parallel-checkout with ident' '
+	set_checkout_config 2 0 &&
+	git init ident &&
+	(
+		cd ident &&
+		echo "A ident" >.gitattributes &&
+		echo "\$Id\$" >A &&
+		echo "\$Id\$" >B &&
+		git add -A &&
+		git commit -m id &&
+
+		rm A B &&
+		test_checkout_workers 2 git reset --hard &&
+		hexsz=$(test_oid hexsz) &&
+		grep -E "\\\$Id: [0-9a-f]{$hexsz} \\\$" A &&
+		grep "\\\$Id\\\$" B
+	)
+'
+
+test_expect_success 'parallel-checkout with re-encoding' '
+	set_checkout_config 2 0 &&
+	git init encoding &&
+	(
+		cd encoding &&
+		echo text >utf8-text &&
+		write_utf16 <utf8-text >utf16-text &&
+
+		echo "A working-tree-encoding=UTF-16" >.gitattributes &&
+		cp utf16-text A &&
+		cp utf8-text B &&
+		git add A B .gitattributes &&
+		git commit -m encoding &&
+
+		# Check that A is stored in UTF-8
+		git cat-file -p :A >A.internal &&
+		test_cmp_bin utf8-text A.internal &&
+
+		rm A B &&
+		test_checkout_workers 2 git checkout A B &&
+
+		# Check that A (and only A) is re-encoded during checkout
+		test_cmp_bin utf16-text A &&
+		test_cmp_bin utf8-text B
+	)
+'
+
+test_expect_success 'parallel-checkout with eol conversions' '
+	set_checkout_config 2 0 &&
+	git init eol &&
+	(
+		cd eol &&
+		printf "multi\r\nline\r\ntext" >crlf-text &&
+		printf "multi\nline\ntext" >lf-text &&
+
+		git config core.autocrlf false &&
+		echo "A eol=crlf" >.gitattributes &&
+		cp crlf-text A &&
+		cp lf-text B &&
+		git add A B .gitattributes &&
+		git commit -m eol &&
+
+		# Check that A is stored with LF format
+		git cat-file -p :A >A.internal &&
+		test_cmp_bin lf-text A.internal &&
+
+		rm A B &&
+		test_checkout_workers 2 git checkout A B &&
+
+		# Check that A (and only A) is converted to CRLF during checkout
+		test_cmp_bin crlf-text A &&
+		test_cmp_bin lf-text B
+	)
+'
+
+# Entries that require an external filter are not eligible for parallel
+# checkout. Check that both the parallel-eligible and non-eligible entries are
+# properly writen in a single checkout operation.
+#
+test_expect_success 'parallel-checkout and external filter' '
+	set_checkout_config 2 0 &&
+	git init filter &&
+	(
+		cd filter &&
+		write_script <<-\EOF rot13.sh &&
+		tr \
+		  "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" \
+		  "nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM"
+		EOF
+
+		git config filter.rot13.clean "\"$(pwd)/rot13.sh\"" &&
+		git config filter.rot13.smudge "\"$(pwd)/rot13.sh\"" &&
+		git config filter.rot13.required true &&
+
+		echo abcd >original &&
+		echo nopq >rot13 &&
+
+		echo "A filter=rot13" >.gitattributes &&
+		cp original A &&
+		cp original B &&
+		cp original C &&
+		git add A B C .gitattributes &&
+		git commit -m filter &&
+
+		# Check that A (and only A) was cleaned
+		git cat-file -p :A >A.internal &&
+		test_cmp rot13 A.internal &&
+		git cat-file -p :B >B.internal &&
+		test_cmp original B.internal &&
+		git cat-file -p :C >C.internal &&
+		test_cmp original C.internal &&
+
+		rm A B C *.internal &&
+		test_checkout_workers 2 git checkout A B C &&
+
+		# Check that A (and only A) was smudged during checkout
+		test_cmp original A &&
+		test_cmp original B &&
+		test_cmp original C
+	)
+'
+
+# The delayed queue is independent from the parallel queue, and they should be
+# able to work together in the same checkout process.
+#
+test_expect_success PERL 'parallel-checkout and delayed checkout' '
+	write_script rot13-filter.pl "$PERL_PATH" \
+		<"$TEST_DIRECTORY"/t0021/rot13-filter.pl &&
+
+	test_config_global filter.delay.process \
+		"\"$(pwd)/rot13-filter.pl\" --always-delay \"$(pwd)/delayed.log\" clean smudge delay" &&
+	test_config_global filter.delay.required true &&
+
+	echo "abcd" >original &&
+	echo "nopq" >rot13 &&
+
+	git init delayed &&
+	(
+		cd delayed &&
+		echo "*.d filter=delay" >.gitattributes &&
+		cp ../original W.d &&
+		cp ../original X.d &&
+		cp ../original Y &&
+		cp ../original Z &&
+		git add -A &&
+		git commit -m delayed &&
+
+		# Check that *.d files were cleaned
+		git cat-file -p :W.d >W.d.internal &&
+		test_cmp W.d.internal ../rot13 &&
+		git cat-file -p :X.d >X.d.internal &&
+		test_cmp X.d.internal ../rot13 &&
+		git cat-file -p :Y >Y.internal &&
+		test_cmp Y.internal ../original &&
+		git cat-file -p :Z >Z.internal &&
+		test_cmp Z.internal ../original &&
+
+		rm *
+	) &&
+
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git -C delayed checkout -f &&
+	verify_checkout delayed &&
+
+	# Check that the *.d files got to the delay queue and were filtered
+	grep "smudge W.d .* \[DELAYED\]" delayed.log &&
+	grep "smudge X.d .* \[DELAYED\]" delayed.log &&
+	test_cmp delayed/W.d original &&
+	test_cmp delayed/X.d original &&
+
+	# Check that the parallel-eligible entries went to the right queue and
+	# were not filtered
+	! grep "smudge Y .* \[DELAYED\]" delayed.log &&
+	! grep "smudge Z .* \[DELAYED\]" delayed.log &&
+	test_cmp delayed/Y original &&
+	test_cmp delayed/Z original
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 8/8] ci: run test round with parallel-checkout enabled
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
                     ` (6 preceding siblings ...)
  2021-04-30 21:40   ` [PATCH v2 7/8] parallel-checkout: add tests related to .gitattributes Matheus Tavares
@ 2021-04-30 21:40   ` Matheus Tavares
  2021-05-02 10:12   ` [PATCH v2 0/8] Parallel Checkout (part 3) Torsten Bögershausen
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
  9 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-04-30 21:40 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee

We already have tests for the basic parallel-checkout operations. But
this code can also run be executed by other commands, such as
git-read-tree and git-sparse-checkout, which are currently not tested
with multiple workers. To promote a wider test coverage without
duplicating tests:

1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally
   force parallel-checkout execution during the whole test suite.

2. Set this variable (with a value of 2) in the second test round of our
   linux-gcc CI job. This round runs `make test` again with some
   optional GIT_TEST_* variables enabled, so there is no additional
   overhead in exercising the parallel-checkout code here.

Note that tests checking out less than two parallel-eligible entries
will fall back to the sequential mode. Nevertheless, it's still a good
exercise for the parallel-checkout framework as the fallback codepath
also writes the queued entries using the parallel-checkout functions
(only without spawning any worker).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 ci/run-build-and-tests.sh  |  1 +
 parallel-checkout.c        | 14 ++++++++++++++
 t/README                   |  4 ++++
 t/lib-parallel-checkout.sh |  3 +++
 4 files changed, 22 insertions(+)

diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index a66b5e8c75..23b28e7391 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,6 +25,7 @@ linux-gcc)
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
+	export GIT_TEST_CHECKOUT_WORKERS=2
 	make test
 	;;
 linux-clang)
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 6fb3f1e6c9..6b1af32bb3 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -35,6 +35,20 @@ static const int DEFAULT_NUM_WORKERS = 1;
 
 void get_parallel_checkout_configs(int *num_workers, int *threshold)
 {
+	char *env_workers = getenv("GIT_TEST_CHECKOUT_WORKERS");
+
+	if (env_workers && *env_workers) {
+		if (strtol_i(env_workers, 10, num_workers)) {
+			die("invalid value for GIT_TEST_CHECKOUT_WORKERS: '%s'",
+			    env_workers);
+		}
+		if (*num_workers < 1)
+			*num_workers = online_cpus();
+
+		*threshold = 0;
+		return;
+	}
+
 	if (git_config_get_int("checkout.workers", num_workers))
 		*num_workers = DEFAULT_NUM_WORKERS;
 	else if (*num_workers < 1)
diff --git a/t/README b/t/README
index fd9375b146..a194488f27 100644
--- a/t/README
+++ b/t/README
@@ -436,6 +436,10 @@ and "sha256".
 GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
 'pack.writeReverseIndex' setting.
 
+GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
+to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
+execution of the parallel-checkout code.
+
 Naming Tests
 ------------
 
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index d6740425b1..21f5759732 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -1,5 +1,8 @@
 # Helpers for tests invoking parallel-checkout
 
+# Parallel checkout tests need full control of the number of workers
+unset GIT_TEST_CHECKOUT_WORKERS
+
 set_checkout_config () {
 	if test $# -ne 2
 	then
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool
  2021-04-30 21:40   ` [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
@ 2021-05-01 17:06     ` Christian Couder
  2021-05-03 14:11       ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 47+ messages in thread
From: Christian Couder @ 2021-05-01 17:06 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Jeff Hostetler, Derrick Stolee

>  struct cache_entry *make_transient_cache_entry(unsigned int mode,
>                                                const struct object_id *oid,
>                                                const char *path,
> -                                              int stage);
> +                                              int stage, struct mem_pool *mp);

It's a bit strange that `int stage` isn't on its own line here, as
other parameters are. And if line length was the issue, it looks like
it could have been on the same line as `const char *path`.

> -struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid,
> -                                              const char *path, int stage)
> +struct cache_entry *make_transient_cache_entry(unsigned int mode,
> +                                              const struct object_id *oid,
> +                                              const char *path, int stage,

Here also, it's a bit strange that `int stage` isn't on its own line,
as it looks like you want  to put others parameters on their own line.
And this is not consistent with the above declaration.

> +                                              struct mem_pool *mp)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 2/8] builtin/checkout.c: complete parallel checkout support
  2021-04-30 21:40   ` [PATCH v2 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2021-05-01 17:08     ` Christian Couder
  2021-05-03 14:21       ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 47+ messages in thread
From: Christian Couder @ 2021-05-01 17:08 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Jeff Hostetler, Derrick Stolee

On Fri, Apr 30, 2021 at 11:40 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Pathspec-limited checkouts (like `git checkout *.txt`) are performed by
> a code path that doesn't yet support parallel checkout because it calls
> checkout_entry() directly, instead of unpack_trees(). Let's add parallel
> checkout support for this code path too.
>
> Note: the transient cache entries allocated in checkout_merged() are now

s/Note: the/The/

> allocated in a mem_pool which is only discarded after parallel checkout
> finishes. This is done because the entries need to be valid when
> run_parallel_checkout() is called.

> -static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts)
> +static int checkout_merged(int pos, const struct checkout *state,
> +                          int *nr_checkouts, struct mem_pool *ce_mem_pool)

For consistency with the previous patch, maybe: s/ce_mem_pool/ce_mp/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 3/8] checkout-index: add parallel checkout support
  2021-04-30 21:40   ` [PATCH v2 3/8] checkout-index: add " Matheus Tavares
@ 2021-05-01 17:08     ` Christian Couder
  2021-05-03 14:22       ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 47+ messages in thread
From: Christian Couder @ 2021-05-01 17:08 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Jeff Hostetler, Derrick Stolee

On Fri, Apr 30, 2021 at 11:40 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Note: previously, `checkout_all()` would not return on errors, but

s/Note: previously/Previously/

> instead call `exit()` with a non-zero code. However, it only did that
> after calling `checkout_entry()` for all index entries, thus not
> stopping on the first error, but attempting to write the maximum number
> of entries possible. In order to maintain this behavior we now propagate
> `checkout_all()`s error status to `cmd_checkout_index()`, so that it can
> call `run_parallel_checkout()` and attempt to write the queued entries
> before exiting with the error code.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>

> @@ -142,11 +143,7 @@ static void checkout_all(const char *prefix, int prefix_length)
>         }
>         if (last_ce && to_tempfile)
>                 write_tempfile_record(last_ce->name, prefix);
> -       if (errs)
> -               /* we have already done our error reporting.
> -                * exit with the same code as die().
> -                */
> -               exit(128);

So when there were errors in checkout_all(), we used to exit() with
error code 128 (same as die())...

> @@ -275,12 +277,16 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
>                 strbuf_release(&buf);
>         }
>
> +       if (all)
> +               err |= checkout_all(prefix, prefix_length);
> +
> +       if (pc_workers > 1)
> +               err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
> +                                            NULL, NULL);
> +
>         if (err)
>                 return 1;

...but now it looks like we will exit with error code 1. I see that
you already answered this comment in the previous round of review, but
you didn't add the explanations to the commit message.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 5/8] parallel-checkout: add tests related to path collisions
  2021-04-30 21:40   ` [PATCH v2 5/8] parallel-checkout: add tests related to path collisions Matheus Tavares
@ 2021-05-02  7:59     ` Torsten Bögershausen
  2021-05-03 14:58       ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 47+ messages in thread
From: Torsten Bögershausen @ 2021-05-02  7:59 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, christian.couder, git, stolee

On Fri, Apr 30, 2021 at 06:40:32PM -0300, Matheus Tavares wrote:
> Add tests to confirm that path collisions are properly detected by
> checkout workers, both to avoid race conditions and to report colliding
> entries on clone.
>
> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  parallel-checkout.c                     |   4 +
>  t/lib-parallel-checkout.sh              |   4 +-
>  t/t2081-parallel-checkout-collisions.sh | 162 ++++++++++++++++++++++++
>  3 files changed, 168 insertions(+), 2 deletions(-)
>  create mode 100755 t/t2081-parallel-checkout-collisions.sh
>
> diff --git a/parallel-checkout.c b/parallel-checkout.c
> index 09e8b10a35..6fb3f1e6c9 100644
> --- a/parallel-checkout.c
> +++ b/parallel-checkout.c
> @@ -8,6 +8,7 @@
>  #include "sigchain.h"
>  #include "streaming.h"
>  #include "thread-utils.h"
> +#include "trace2.h"
>
>  struct pc_worker {
>  	struct child_process cp;
> @@ -326,6 +327,7 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
>  	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
>  					   state->base_dir_len)) {
>  		pc_item->status = PC_ITEM_COLLIDED;
> +		trace2_data_string("pcheckout", NULL, "collision/dirname", path.buf);
>  		goto out;
>  	}
>
> @@ -341,6 +343,8 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
>  			 * call should have already caught these cases.
>  			 */
>  			pc_item->status = PC_ITEM_COLLIDED;
> +			trace2_data_string("pcheckout", NULL,
> +					   "collision/basename", path.buf);
>  		} else {
>  			error_errno("failed to open file '%s'", path.buf);
>  			pc_item->status = PC_ITEM_FAILED;
> diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
> index f60b22ef34..d6740425b1 100644
> --- a/t/lib-parallel-checkout.sh
> +++ b/t/lib-parallel-checkout.sh
> @@ -22,12 +22,12 @@ test_checkout_workers () {
>
>  	local trace_file=trace-test-checkout-workers &&
>  	rm -f "$trace_file" &&
> -	GIT_TRACE2="$(pwd)/$trace_file" "$@" &&
> +	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
>
>  	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
>  	test $workers -eq $expected_workers &&
>  	rm "$trace_file"
> -}
> +} 8>&2 2>&4
>
>  # Verify that both the working tree and the index were created correctly
>  verify_checkout () {
> diff --git a/t/t2081-parallel-checkout-collisions.sh b/t/t2081-parallel-checkout-collisions.sh
> new file mode 100755
> index 0000000000..f6fcfc0c1e
> --- /dev/null
> +++ b/t/t2081-parallel-checkout-collisions.sh
> @@ -0,0 +1,162 @@
> +#!/bin/sh
> +
> +test_description="path collisions during parallel checkout
> +
> +Parallel checkout must detect path collisions to:
> +
> +1) Avoid racily writing to different paths that represent the same file on disk.
> +2) Report the colliding entries on clone.
> +
> +The tests in this file exercise parallel checkout's collision detection code in
> +both these mechanics.
> +"
> +
> +. ./test-lib.sh
> +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"

Why this $TEST_DIRECTORY ?
Aren't all files under the t/ directory, where test-lib.sh is as well ?
(The $TEST_DIRECTORY macro is used at different places, so I may have missed
something)

> +
> +TEST_ROOT="$PWD"
> +
> +test_expect_success CASE_INSENSITIVE_FS 'setup' '
> +	empty_oid=$(git hash-object -w --stdin </dev/null) &&
> +	cat >objs <<-EOF &&
> +	100644 $empty_oid	FILE_X
> +	100644 $empty_oid	FILE_x
> +	100644 $empty_oid	file_X
> +	100644 $empty_oid	file_x
> +	EOF
> +	git update-index --index-info <objs &&
> +	git commit -m "colliding files" &&
> +	git tag basename_collision &&
> +
> +	write_script "$TEST_ROOT"/logger_script <<-\EOF
> +	echo "$@" >>filter.log
> +	EOF
> +'
> +
> +test_workers_in_event_trace ()
> +{
> +	test $1 -eq $(grep ".event.:.child_start..*checkout--worker" $2 | wc -l)
> +}
> +
> +test_expect_success CASE_INSENSITIVE_FS 'worker detects basename collision' '
> +	GIT_TRACE2_EVENT="$(pwd)/trace" git \
> +		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
> +		checkout . &&
> +
> +	test_workers_in_event_trace 2 trace &&
> +	collisions=$(grep -i "category.:.pcheckout.,.key.:.collision/basename.,.value.:.file_x.}" trace | wc -l) &&
> +	test $collisions -eq 3
> +'
> +
> +test_expect_success CASE_INSENSITIVE_FS 'worker detects dirname collision' '
> +	test_config filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
> +	empty_oid=$(git hash-object -w --stdin </dev/null) &&
> +
> +	# By setting a filter command to "a", we make it ineligible for parallel
> +	# checkout, and thus it is checked out *first*. This way we can ensure
> +	# that "A/B" and "A/C" will both collide with the regular file "a".
> +	#
> +	attr_oid=$(echo "a filter=logger" | git hash-object -w --stdin) &&
> +
> +	cat >objs <<-EOF &&
> +	100644 $empty_oid	A/B
> +	100644 $empty_oid	A/C
> +	100644 $empty_oid	a
> +	100644 $attr_oid	.gitattributes
> +	EOF
> +	git rm -rf . &&
> +	git update-index --index-info <objs &&
> +
> +	rm -f trace filter.log &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace" git \
> +		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
> +		checkout . &&
> +
> +	# Check that "a" (and only "a") was filtered
> +	echo a >expected.log &&
> +	test_cmp filter.log expected.log &&
> +
> +	# Check that it used the right number of workers and detected the collisions
> +	test_workers_in_event_trace 2 trace &&
> +	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/B.}" trace &&
> +	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/C.}" trace
> +'
> +
> +test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'do not follow symlinks colliding with leading dir' '
> +	empty_oid=$(git hash-object -w --stdin </dev/null) &&
> +	symlink_oid=$(echo "./e" | git hash-object -w --stdin) &&
> +	mkdir e &&
> +
> +	cat >objs <<-EOF &&
> +	120000 $symlink_oid	D
> +	100644 $empty_oid	d/x
> +	100644 $empty_oid	e/y
> +	EOF
> +	git rm -rf . &&
> +	git update-index --index-info <objs &&
> +
> +	set_checkout_config 2 0 &&
> +	test_checkout_workers 2 git checkout . &&
> +	test_path_is_dir e &&
> +	test_path_is_missing e/x
> +'
> +
> +# The two following tests check that parallel checkout correctly reports
> +# colliding entries on clone. The sequential code detects a collision by
> +# calling lstat() before trying to open(O_CREAT) a file. (Note that this only
> +# works for clone.) Then, to find the pair of a colliding item k, it searches
> +# cache_entry[0, k-1]. This is not sufficient in parallel checkout because:
> +#
> +# - A colliding file may be created between the lstat() and open() calls;
> +# - A colliding entry might appear in the second half of the cache_entry array.
> +#
> +test_expect_success CASE_INSENSITIVE_FS 'collision report on clone (w/ racy file creation)' '
> +	git reset --hard basename_collision &&
> +	set_checkout_config 2 0 &&
> +	test_checkout_workers 2 git clone . clone-repo 2>stderr &&
> +
> +	grep FILE_X stderr &&
> +	grep FILE_x stderr &&
> +	grep file_X stderr &&
> +	grep file_x stderr &&
> +	grep "the following paths have collided" stderr
> +'
> +
> +# This test ensures that the collision report code is correctly looking for
> +# colliding peers in the second half of the cache_entry array. This is done by
> +# defining a smudge command for the *last* array entry, which makes it
> +# non-eligible for parallel-checkout. Thus, it is checked out *first*, before
> +# spawning the workers.
> +#
> +# Note: this test doesn't work on Windows because, on this system, the
> +# collision report code uses strcmp() to find the colliding pairs when
> +# core.ignoreCase is false. And we need this setting for this test so that only
> +# 'file_x' matches the pattern of the filter attribute. But the test works on
> +# OSX, where the colliding pairs are found using inode.
> +#
> +test_expect_success CASE_INSENSITIVE_FS,!MINGW,!CYGWIN \
> +	'collision report on clone (w/ colliding peer after the detected entry)' '
> +
> +	test_config_global filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
> +	git reset --hard basename_collision &&
> +	echo "file_x filter=logger" >.gitattributes &&
> +	git add .gitattributes &&
> +	git commit -m "filter for file_x" &&
> +
> +	rm -rf clone-repo &&
> +	set_checkout_config 2 0 &&
> +	test_checkout_workers 2 \
> +		git -c core.ignoreCase=false clone . clone-repo 2>stderr &&
> +
> +	grep FILE_X stderr &&
> +	grep FILE_x stderr &&
> +	grep file_X stderr &&
> +	grep file_x stderr &&
> +	grep "the following paths have collided" stderr &&
> +
> +	# Check that only "file_x" was filtered
> +	echo file_x >expected.log &&
> +	test_cmp clone-repo/filter.log expected.log
> +'
> +
> +test_done
> --
> 2.30.1
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/8] Parallel Checkout (part 3)
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
                     ` (7 preceding siblings ...)
  2021-04-30 21:40   ` [PATCH v2 8/8] ci: run test round with parallel-checkout enabled Matheus Tavares
@ 2021-05-02 10:12   ` Torsten Bögershausen
  2021-05-03 15:01     ` Matheus Tavares Bernardino
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
  9 siblings, 1 reply; 47+ messages in thread
From: Torsten Bögershausen @ 2021-05-02 10:12 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, christian.couder, git, stolee

On Fri, Apr 30, 2021 at 06:40:27PM -0300, Matheus Tavares wrote:

[]

Thank's for the work.
I just make a quick test:

"swapping" git.git between

HEAD is now at e870325ee8 The third batch
and
HEAD is now at 7e39198978 The thirteenth batch

On a Desktop running Linux.
Using the git.git repo, mounted as nfs on an oldish NAS
with spinning disks (and WLAN) gave this timing:

git -c checkout.workers=1
2 minutes, some seconds.

git -c checkout.workers=4
1 minute, some seconds.

git -c checkout.workers=16
40..44 seconds

Not a scientific measurement, just a positive feedback from a real-world.

Good work, I'm looking forward to have this feature in git.git



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool
  2021-05-01 17:06     ` Christian Couder
@ 2021-05-03 14:11       ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-03 14:11 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Jeff Hostetler, Derrick Stolee

On Sat, May 1, 2021 at 2:06 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> >  struct cache_entry *make_transient_cache_entry(unsigned int mode,
> >                                                const struct object_id *oid,
> >                                                const char *path,
> > -                                              int stage);
> > +                                              int stage, struct mem_pool *mp);
>
> It's a bit strange that `int stage` isn't on its own line here, as
> other parameters are. And if line length was the issue, it looks like
> it could have been on the same line as `const char *path`.
>
> > -struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid,
> > -                                              const char *path, int stage)
> > +struct cache_entry *make_transient_cache_entry(unsigned int mode,
> > +                                              const struct object_id *oid,
> > +                                              const char *path, int stage,
>
> Here also, it's a bit strange that `int stage` isn't on its own line,
> as it looks like you want  to put others parameters on their own line.
> And this is not consistent with the above declaration.

Thanks. I'll put each parameter in its own line and make this
consistent with the declaration.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 2/8] builtin/checkout.c: complete parallel checkout support
  2021-05-01 17:08     ` Christian Couder
@ 2021-05-03 14:21       ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-03 14:21 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Jeff Hostetler, Derrick Stolee

On Sat, May 1, 2021 at 2:08 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Fri, Apr 30, 2021 at 11:40 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Pathspec-limited checkouts (like `git checkout *.txt`) are performed by
> > a code path that doesn't yet support parallel checkout because it calls
> > checkout_entry() directly, instead of unpack_trees(). Let's add parallel
> > checkout support for this code path too.
> >
> > Note: the transient cache entries allocated in checkout_merged() are now
>
> s/Note: the/The/

Thanks

> > allocated in a mem_pool which is only discarded after parallel checkout
> > finishes. This is done because the entries need to be valid when
> > run_parallel_checkout() is called.
>
> > -static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts)
> > +static int checkout_merged(int pos, const struct checkout *state,
> > +                          int *nr_checkouts, struct mem_pool *ce_mem_pool)
>
> For consistency with the previous patch, maybe: s/ce_mem_pool/ce_mp/

Yeah, I agree that it's a good idea to keep this consistent. In fact,
instead of changing `ce_mem_pool` here, I think I should change `mp`
in the previous patch to either `mem_pool` or `ce_mem_pool`. Those
seem to be the most common names, at read-cache.c, for a memory pool
struct holding cache entries.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 3/8] checkout-index: add parallel checkout support
  2021-05-01 17:08     ` Christian Couder
@ 2021-05-03 14:22       ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-03 14:22 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Jeff Hostetler, Derrick Stolee

On Sat, May 1, 2021 at 2:08 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Fri, Apr 30, 2021 at 11:40 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Note: previously, `checkout_all()` would not return on errors, but
>
> s/Note: previously/Previously/
>
> > instead call `exit()` with a non-zero code. However, it only did that
> > after calling `checkout_entry()` for all index entries, thus not
> > stopping on the first error, but attempting to write the maximum number
> > of entries possible. In order to maintain this behavior we now propagate
> > `checkout_all()`s error status to `cmd_checkout_index()`, so that it can
> > call `run_parallel_checkout()` and attempt to write the queued entries
> > before exiting with the error code.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>
> > @@ -142,11 +143,7 @@ static void checkout_all(const char *prefix, int prefix_length)
> >         }
> >         if (last_ce && to_tempfile)
> >                 write_tempfile_record(last_ce->name, prefix);
> > -       if (errs)
> > -               /* we have already done our error reporting.
> > -                * exit with the same code as die().
> > -                */
> > -               exit(128);
>
> So when there were errors in checkout_all(), we used to exit() with
> error code 128 (same as die())...
>
> > @@ -275,12 +277,16 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
> >                 strbuf_release(&buf);
> >         }
> >
> > +       if (all)
> > +               err |= checkout_all(prefix, prefix_length);
> > +
> > +       if (pc_workers > 1)
> > +               err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
> > +                                            NULL, NULL);
> > +
> >         if (err)
> >                 return 1;
>
> ...but now it looks like we will exit with error code 1. I see that
> you already answered this comment in the previous round of review, but
> you didn't add the explanations to the commit message.

Oops, good catch! I'll add the explanation for v3. Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 5/8] parallel-checkout: add tests related to path collisions
  2021-05-02  7:59     ` Torsten Bögershausen
@ 2021-05-03 14:58       ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-03 14:58 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: git, Christian Couder, Jeff Hostetler, Derrick Stolee

On Sun, May 2, 2021 at 4:59 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Fri, Apr 30, 2021 at 06:40:32PM -0300, Matheus Tavares wrote:
> >
> > diff --git a/t/t2081-parallel-checkout-collisions.sh b/t/t2081-parallel-checkout-collisions.sh
> > new file mode 100755
> > index 0000000000..f6fcfc0c1e
> > --- /dev/null
> > +++ b/t/t2081-parallel-checkout-collisions.sh
> > @@ -0,0 +1,162 @@
> > +#!/bin/sh
> > +
> > +test_description="path collisions during parallel checkout
> > +
> > +Parallel checkout must detect path collisions to:
> > +
> > +1) Avoid racily writing to different paths that represent the same file on disk.
> > +2) Report the colliding entries on clone.
> > +
> > +The tests in this file exercise parallel checkout's collision detection code in
> > +both these mechanics.
> > +"
> > +
> > +. ./test-lib.sh
> > +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
>
> Why this $TEST_DIRECTORY ?
> Aren't all files under the t/ directory, where test-lib.sh is as well ?

Good point. From what I understand, the reason why we need the macro
here is that, when running the test with --root=<path>, the trash dir
might actually be at a path outside `t/`. So the test can't rely on
`./` to read files from `t/`. We don't need the macro for
`test-lib.sh` because the working dir is only changed after it is
sourced (by `test-lib.sh` itself).

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/8] Parallel Checkout (part 3)
  2021-05-02 10:12   ` [PATCH v2 0/8] Parallel Checkout (part 3) Torsten Bögershausen
@ 2021-05-03 15:01     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares Bernardino @ 2021-05-03 15:01 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: git, Christian Couder, Jeff Hostetler, Derrick Stolee

On Sun, May 2, 2021 at 7:12 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Fri, Apr 30, 2021 at 06:40:27PM -0300, Matheus Tavares wrote:
>
> []
>
> Thank's for the work.
> I just make a quick test:
>
> "swapping" git.git between
>
> HEAD is now at e870325ee8 The third batch
> and
> HEAD is now at 7e39198978 The thirteenth batch
>
> On a Desktop running Linux.
> Using the git.git repo, mounted as nfs on an oldish NAS
> with spinning disks (and WLAN) gave this timing:
>
> git -c checkout.workers=1
> 2 minutes, some seconds.
>
> git -c checkout.workers=4
> 1 minute, some seconds.
>
> git -c checkout.workers=16
> 40..44 seconds
>
> Not a scientific measurement, just a positive feedback from a real-world.

Nice! Thanks for testing the parallel version and for your feedback on
it, I appreciate it.

> Good work, I'm looking forward to have this feature in git.git

Thanks :)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 0/8] Parallel Checkout (part 3)
  2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
                     ` (8 preceding siblings ...)
  2021-05-02 10:12   ` [PATCH v2 0/8] Parallel Checkout (part 3) Torsten Bögershausen
@ 2021-05-04 16:27   ` Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
                       ` (8 more replies)
  9 siblings, 9 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

This is the last part of the parallel checkout series. It adds tests and
parallel checkout support to `git checkout-index` and
`git checkout <pathspec>`.

I rebased this version onto `master`, as part-2 was merged down to
`master`.

Changes since v2:

* Patch 1: Aligned function parameters in make_transient_cache_entry(),
  and renamed the `struct mem_pool *` argument from `mp` to
  `ce_mem_pool`. This is more consistent with the other functions using
  in read-cache.c and with how we call this function in the next patch.

* Patch 2: Removed unnecessary 'Note: ' from commit message.

* Patch 3: Rewrote commit message to better explain why we unified the
  exit paths for `checkout_all()` and `checkout_file()` modes, and
  changed `git checkout-index --all`s error code from 128 to 1.

Matheus Tavares (8):
  make_transient_cache_entry(): optionally alloc from mem_pool
  builtin/checkout.c: complete parallel checkout support
  checkout-index: add parallel checkout support
  parallel-checkout: add tests for basic operations
  parallel-checkout: add tests related to path collisions
  t0028: extract encoding helpers to lib-encoding.sh
  parallel-checkout: add tests related to .gitattributes
  ci: run test round with parallel-checkout enabled

 builtin/checkout--worker.c              |   2 +-
 builtin/checkout-index.c                |  24 ++-
 builtin/checkout.c                      |  20 ++-
 builtin/difftool.c                      |   2 +-
 cache.h                                 |  14 +-
 ci/run-build-and-tests.sh               |   1 +
 parallel-checkout.c                     |  18 ++
 read-cache.c                            |  14 +-
 t/README                                |   4 +
 t/lib-encoding.sh                       |  25 +++
 t/lib-parallel-checkout.sh              |  45 +++++
 t/t0028-working-tree-encoding.sh        |  25 +--
 t/t2080-parallel-checkout-basics.sh     | 229 ++++++++++++++++++++++++
 t/t2081-parallel-checkout-collisions.sh | 162 +++++++++++++++++
 t/t2082-parallel-checkout-attributes.sh | 194 ++++++++++++++++++++
 unpack-trees.c                          |   2 +-
 16 files changed, 732 insertions(+), 49 deletions(-)
 create mode 100644 t/lib-encoding.sh
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh
 create mode 100755 t/t2081-parallel-checkout-collisions.sh
 create mode 100755 t/t2082-parallel-checkout-attributes.sh

Range-diff against v2:
1:  f870040bfb ! 1:  bf6c7114aa make_transient_cache_entry(): optionally alloc from mem_pool
    @@ cache.h: struct cache_entry *make_empty_cache_entry(struct index_state *istate,
     - * Create a cache_entry that is not intended to be added to an index.
     - * Caller is responsible for discarding the cache_entry
     - * with `discard_cache_entry`.
    -+ * Create a cache_entry that is not intended to be added to an index. If `mp`
    -+ * is not NULL, the entry is allocated within the given memory pool. Caller is
    -+ * responsible for discarding "loose" entries with `discard_cache_entry()` and
    -+ * the mem_pool with `mem_pool_discard(mp, should_validate_cache_entries())`.
    ++ * Create a cache_entry that is not intended to be added to an index. If
    ++ * `ce_mem_pool` is not NULL, the entry is allocated within the given memory
    ++ * pool. Caller is responsible for discarding "loose" entries with
    ++ * `discard_cache_entry()` and the memory pool with
    ++ * `mem_pool_discard(ce_mem_pool, should_validate_cache_entries())`.
       */
      struct cache_entry *make_transient_cache_entry(unsigned int mode,
      					       const struct object_id *oid,
      					       const char *path,
     -					       int stage);
    -+					       int stage, struct mem_pool *mp);
    ++					       int stage,
    ++					       struct mem_pool *ce_mem_pool);
      
     -struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
    -+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp);
    ++struct cache_entry *make_empty_transient_cache_entry(size_t len,
    ++						     struct mem_pool *ce_mem_pool);
      
      /*
       * Discard cache entry.
    @@ read-cache.c: struct cache_entry *make_empty_cache_entry(struct index_state *ist
      }
      
     -struct cache_entry *make_empty_transient_cache_entry(size_t len)
    -+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp)
    ++struct cache_entry *make_empty_transient_cache_entry(size_t len,
    ++						     struct mem_pool *ce_mem_pool)
      {
    -+	if (mp)
    -+		return mem_pool__ce_calloc(mp, len);
    ++	if (ce_mem_pool)
    ++		return mem_pool__ce_calloc(ce_mem_pool, len);
      	return xcalloc(1, cache_entry_size(len));
      }
      
    @@ read-cache.c: struct cache_entry *make_cache_entry(struct index_state *istate,
     -					       const char *path, int stage)
     +struct cache_entry *make_transient_cache_entry(unsigned int mode,
     +					       const struct object_id *oid,
    -+					       const char *path, int stage,
    -+					       struct mem_pool *mp)
    ++					       const char *path,
    ++					       int stage,
    ++					       struct mem_pool *ce_mem_pool)
      {
      	struct cache_entry *ce;
      	int len;
    @@ read-cache.c: struct cache_entry *make_transient_cache_entry(unsigned int mode,
      
      	len = strlen(path);
     -	ce = make_empty_transient_cache_entry(len);
    -+	ce = make_empty_transient_cache_entry(len, mp);
    ++	ce = make_empty_transient_cache_entry(len, ce_mem_pool);
      
      	oidcpy(&ce->oid, oid);
      	memcpy(ce->name, path, len);
2:  e2d82c4337 ! 2:  e898889787 builtin/checkout.c: complete parallel checkout support
    @@ Commit message
         checkout_entry() directly, instead of unpack_trees(). Let's add parallel
         checkout support for this code path too.
     
    -    Note: the transient cache entries allocated in checkout_merged() are now
    +    The transient cache entries allocated in checkout_merged() are now
         allocated in a mem_pool which is only discarded after parallel checkout
         finishes. This is done because the entries need to be valid when
         run_parallel_checkout() is called.
    @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opt
      	enable_delayed_checkout(&state);
     +	if (pc_workers > 1)
     +		init_parallel_checkout();
    - 	for (pos = 0; pos < active_nr; pos++) {
    - 		struct cache_entry *ce = active_cache[pos];
    - 		if (ce->ce_flags & CE_MATCHED) {
    + 
    + 	/* TODO: audit for interaction with sparse-index. */
    + 	ensure_full_index(&the_index);
     @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opts,
      						       &nr_checkouts, opts->overlay_mode);
      			else if (opts->merge)
3:  0fe1a5fabc ! 3:  916f391a46 checkout-index: add parallel checkout support
    @@ Metadata
      ## Commit message ##
         checkout-index: add parallel checkout support
     
    -    Note: previously, `checkout_all()` would not return on errors, but
    -    instead call `exit()` with a non-zero code. However, it only did that
    -    after calling `checkout_entry()` for all index entries, thus not
    -    stopping on the first error, but attempting to write the maximum number
    -    of entries possible. In order to maintain this behavior we now propagate
    -    `checkout_all()`s error status to `cmd_checkout_index()`, so that it can
    -    call `run_parallel_checkout()` and attempt to write the queued entries
    -    before exiting with the error code.
    +    Allow checkout-index to use the parallel checkout framework, honoring
    +    the checkout.workers configuration.
    +
    +    There are two code paths in checkout-index which call
    +    `checkout_entry()`, and thus, can make use of parallel checkout:
    +    `checkout_file()`, which is used to write paths explicitly given at the
    +    command line; and `checkout_all()`, which is used to write all paths in
    +    the index, when the `--all` option is given.
    +
    +    In both operation modes, checkout-index doesn't abort immediately on a
    +    `checkout_entry()` failure. Instead, it tries to check out all remaining
    +    paths before exiting with a non-zero exit code. To keep this behavior
    +    when parallel checkout is being used, we must allow
    +    `run_parallel_checkout()` to try writing the queued entries before we
    +    exit, even if we already got an error code from a previous
    +    `checkout_entry()` call.
    +
    +    However, `checkout_all()` doesn't return on errors, it calls `exit()`
    +    with code 128. We could make it call `run_parallel_checkout()` before
    +    exiting, but it makes the code easier to follow if we unify the exit
    +    path for both checkout-index modes at `cmd_checkout_index()`, and let
    +    this function take care of the interactions with the parallel checkout
    +    API. So let's do that.
    +
    +    With this change, we also have to consider whether we want to keep using
    +    128 as the error code for `git checkout-index --all`, while we use 1 for
    +    `git checkout-index <path>` (even when the actual error is the same).
    +    Since there is not much value in having code 128 only for `--all`, and
    +    there is no mention about it in the docs (so it's unlikely that changing
    +    it will break any existing script), let's make both modes exit with code
    +    1 on `checkout_entry()` errors.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
4:  f604c50dba = 4:  667777053a parallel-checkout: add tests for basic operations
5:  eae6d3a1c1 = 5:  dcb3acab1d parallel-checkout: add tests related to path collisions
6:  9161cd1503 = 6:  6141c46051 t0028: extract encoding helpers to lib-encoding.sh
7:  bc584897e6 = 7:  5350689a30 parallel-checkout: add tests related to .gitattributes
8:  1bc5c523c5 ! 8:  7b2966c488 ci: run test round with parallel-checkout enabled
    @@ parallel-checkout.c: static const int DEFAULT_NUM_WORKERS = 1;
      	else if (*num_workers < 1)
     
      ## t/README ##
    -@@ t/README: and "sha256".
    - GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
    - 'pack.writeReverseIndex' setting.
    +@@ t/README: GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
    + GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the
    + sparse-index format by default.
      
     +GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
     +to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 1/8] make_transient_cache_entry(): optionally alloc from mem_pool
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout--worker.c |  2 +-
 builtin/checkout.c         |  2 +-
 builtin/difftool.c         |  2 +-
 cache.h                    | 14 +++++++++-----
 read-cache.c               | 14 ++++++++++----
 unpack-trees.c             |  2 +-
 6 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/builtin/checkout--worker.c b/builtin/checkout--worker.c
index 31e0de2f7e..289a9b8f89 100644
--- a/builtin/checkout--worker.c
+++ b/builtin/checkout--worker.c
@@ -39,7 +39,7 @@ static void packet_to_pc_item(const char *buffer, int len,
 	}
 
 	memset(pc_item, 0, sizeof(*pc_item));
-	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len, NULL);
 	pc_item->ce->ce_namelen = fixed_portion->name_len;
 	pc_item->ce->ce_mode = fixed_portion->ce_mode;
 	memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5bd9128d1a..a597f31d53 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -291,7 +291,7 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
 	if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
 		die(_("Unable to add merge result for '%s'"), path);
 	free(result_buf.ptr);
-	ce = make_transient_cache_entry(mode, &oid, path, 2);
+	ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
 	if (!ce)
 		die(_("make_cache_entry failed for path '%s'"), path);
 	status = checkout_entry(ce, state, NULL, nr_checkouts);
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 0202a43052..89334b77fb 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -323,7 +323,7 @@ static int checkout_path(unsigned mode, struct object_id *oid,
 	struct cache_entry *ce;
 	int ret;
 
-	ce = make_transient_cache_entry(mode, oid, path, 0);
+	ce = make_transient_cache_entry(mode, oid, path, 0, NULL);
 	ret = checkout_entry(ce, state, NULL, NULL);
 
 	discard_cache_entry(ce);
diff --git a/cache.h b/cache.h
index b785ffb383..0e2b952647 100644
--- a/cache.h
+++ b/cache.h
@@ -370,16 +370,20 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate,
 					   size_t name_len);
 
 /*
- * Create a cache_entry that is not intended to be added to an index.
- * Caller is responsible for discarding the cache_entry
- * with `discard_cache_entry`.
+ * Create a cache_entry that is not intended to be added to an index. If
+ * `ce_mem_pool` is not NULL, the entry is allocated within the given memory
+ * pool. Caller is responsible for discarding "loose" entries with
+ * `discard_cache_entry()` and the memory pool with
+ * `mem_pool_discard(ce_mem_pool, should_validate_cache_entries())`.
  */
 struct cache_entry *make_transient_cache_entry(unsigned int mode,
 					       const struct object_id *oid,
 					       const char *path,
-					       int stage);
+					       int stage,
+					       struct mem_pool *ce_mem_pool);
 
-struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
+struct cache_entry *make_empty_transient_cache_entry(size_t len,
+						     struct mem_pool *ce_mem_pool);
 
 /*
  * Discard cache entry.
diff --git a/read-cache.c b/read-cache.c
index 72a1d339c9..86f95fe62e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -839,8 +839,11 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate, size_t le
 	return mem_pool__ce_calloc(find_mem_pool(istate), len);
 }
 
-struct cache_entry *make_empty_transient_cache_entry(size_t len)
+struct cache_entry *make_empty_transient_cache_entry(size_t len,
+						     struct mem_pool *ce_mem_pool)
 {
+	if (ce_mem_pool)
+		return mem_pool__ce_calloc(ce_mem_pool, len);
 	return xcalloc(1, cache_entry_size(len));
 }
 
@@ -874,8 +877,11 @@ struct cache_entry *make_cache_entry(struct index_state *istate,
 	return ret;
 }
 
-struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid,
-					       const char *path, int stage)
+struct cache_entry *make_transient_cache_entry(unsigned int mode,
+					       const struct object_id *oid,
+					       const char *path,
+					       int stage,
+					       struct mem_pool *ce_mem_pool)
 {
 	struct cache_entry *ce;
 	int len;
@@ -886,7 +892,7 @@ struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct o
 	}
 
 	len = strlen(path);
-	ce = make_empty_transient_cache_entry(len);
+	ce = make_empty_transient_cache_entry(len, ce_mem_pool);
 
 	oidcpy(&ce->oid, oid);
 	memcpy(ce->name, path, len);
diff --git a/unpack-trees.c b/unpack-trees.c
index 7a1804c314..f88a69f8e7 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1038,7 +1038,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	size_t len = traverse_path_len(info, tree_entry_len(n));
 	struct cache_entry *ce =
 		is_transient ?
-		make_empty_transient_cache_entry(len) :
+		make_empty_transient_cache_entry(len, NULL) :
 		make_empty_cache_entry(istate, len);
 
 	ce->ce_mode = create_ce_mode(n->mode);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 2/8] builtin/checkout.c: complete parallel checkout support
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-05 13:55       ` Derrick Stolee
  2021-05-04 16:27     ` [PATCH v3 3/8] checkout-index: add " Matheus Tavares
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

Pathspec-limited checkouts (like `git checkout *.txt`) are performed by
a code path that doesn't yet support parallel checkout because it calls
checkout_entry() directly, instead of unpack_trees(). Let's add parallel
checkout support for this code path too.

The transient cache entries allocated in checkout_merged() are now
allocated in a mem_pool which is only discarded after parallel checkout
finishes. This is done because the entries need to be valid when
run_parallel_checkout() is called.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index a597f31d53..2632dd9eff 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -27,6 +27,7 @@
 #include "wt-status.h"
 #include "xdiff-interface.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 static const char * const checkout_usage[] = {
 	N_("git checkout [<options>] <branch>"),
@@ -230,7 +231,8 @@ static int checkout_stage(int stage, const struct cache_entry *ce, int pos,
 		return error(_("path '%s' does not have their version"), ce->name);
 }
 
-static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts)
+static int checkout_merged(int pos, const struct checkout *state,
+			   int *nr_checkouts, struct mem_pool *ce_mem_pool)
 {
 	struct cache_entry *ce = active_cache[pos];
 	const char *path = ce->name;
@@ -291,11 +293,10 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko
 	if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
 		die(_("Unable to add merge result for '%s'"), path);
 	free(result_buf.ptr);
-	ce = make_transient_cache_entry(mode, &oid, path, 2, NULL);
+	ce = make_transient_cache_entry(mode, &oid, path, 2, ce_mem_pool);
 	if (!ce)
 		die(_("make_cache_entry failed for path '%s'"), path);
 	status = checkout_entry(ce, state, NULL, nr_checkouts);
-	discard_cache_entry(ce);
 	return status;
 }
 
@@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
 	int nr_checkouts = 0, nr_unmerged = 0;
 	int errs = 0;
 	int pos;
+	int pc_workers, pc_threshold;
+	struct mem_pool ce_mem_pool;
 
 	state.force = 1;
 	state.refresh_cache = 1;
 	state.istate = &the_index;
 
+	mem_pool_init(&ce_mem_pool, 0);
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
 	init_checkout_metadata(&state.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : &info->oid,
 			       NULL);
 
 	enable_delayed_checkout(&state);
+	if (pc_workers > 1)
+		init_parallel_checkout();
 
 	/* TODO: audit for interaction with sparse-index. */
 	ensure_full_index(&the_index);
@@ -387,10 +394,15 @@ static int checkout_worktree(const struct checkout_opts *opts,
 						       &nr_checkouts, opts->overlay_mode);
 			else if (opts->merge)
 				errs |= checkout_merged(pos, &state,
-							&nr_unmerged);
+							&nr_unmerged,
+							&ce_mem_pool);
 			pos = skip_same_name(ce, pos) - 1;
 		}
 	}
+	if (pc_workers > 1)
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					      NULL, NULL);
+	mem_pool_discard(&ce_mem_pool, should_validate_cache_entries());
 	remove_marked_cache_entries(&the_index, 1);
 	remove_scheduled_dirs();
 	errs |= finish_delayed_checkout(&state, &nr_checkouts);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 3/8] checkout-index: add parallel checkout support
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 4/8] parallel-checkout: add tests for basic operations Matheus Tavares
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

Allow checkout-index to use the parallel checkout framework, honoring
the checkout.workers configuration.

There are two code paths in checkout-index which call
`checkout_entry()`, and thus, can make use of parallel checkout:
`checkout_file()`, which is used to write paths explicitly given at the
command line; and `checkout_all()`, which is used to write all paths in
the index, when the `--all` option is given.

In both operation modes, checkout-index doesn't abort immediately on a
`checkout_entry()` failure. Instead, it tries to check out all remaining
paths before exiting with a non-zero exit code. To keep this behavior
when parallel checkout is being used, we must allow
`run_parallel_checkout()` to try writing the queued entries before we
exit, even if we already got an error code from a previous
`checkout_entry()` call.

However, `checkout_all()` doesn't return on errors, it calls `exit()`
with code 128. We could make it call `run_parallel_checkout()` before
exiting, but it makes the code easier to follow if we unify the exit
path for both checkout-index modes at `cmd_checkout_index()`, and let
this function take care of the interactions with the parallel checkout
API. So let's do that.

With this change, we also have to consider whether we want to keep using
128 as the error code for `git checkout-index --all`, while we use 1 for
`git checkout-index <path>` (even when the actual error is the same).
Since there is not much value in having code 128 only for `--all`, and
there is no mention about it in the docs (so it's unlikely that changing
it will break any existing script), let's make both modes exit with code
1 on `checkout_entry()` errors.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/checkout-index.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index c9a3c71914..e21620d964 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -12,6 +12,7 @@
 #include "cache-tree.h"
 #include "parse-options.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 #define CHECKOUT_ALL 4
 static int nul_term_line;
@@ -115,7 +116,7 @@ static int checkout_file(const char *name, const char *prefix)
 	return -1;
 }
 
-static void checkout_all(const char *prefix, int prefix_length)
+static int checkout_all(const char *prefix, int prefix_length)
 {
 	int i, errs = 0;
 	struct cache_entry *last_ce = NULL;
@@ -144,11 +145,7 @@ static void checkout_all(const char *prefix, int prefix_length)
 	}
 	if (last_ce && to_tempfile)
 		write_tempfile_record(last_ce->name, prefix);
-	if (errs)
-		/* we have already done our error reporting.
-		 * exit with the same code as die().
-		 */
-		exit(128);
+	return !!errs;
 }
 
 static const char * const builtin_checkout_index_usage[] = {
@@ -184,6 +181,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 	int force = 0, quiet = 0, not_new = 0;
 	int index_opt = 0;
 	int err = 0;
+	int pc_workers, pc_threshold;
 	struct option builtin_checkout_index_options[] = {
 		OPT_BOOL('a', "all", &all,
 			N_("check out all files in the index")),
@@ -238,6 +236,10 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 	}
 
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+	if (pc_workers > 1)
+		init_parallel_checkout();
+
 	/* Check out named files first */
 	for (i = 0; i < argc; i++) {
 		const char *arg = argv[i];
@@ -277,12 +279,16 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	if (all)
+		err |= checkout_all(prefix, prefix_length);
+
+	if (pc_workers > 1)
+		err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					     NULL, NULL);
+
 	if (err)
 		return 1;
 
-	if (all)
-		checkout_all(prefix, prefix_length);
-
 	if (is_lock_file_locked(&lock_file) &&
 	    write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die("Unable to write new index file");
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 4/8] parallel-checkout: add tests for basic operations
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
                       ` (2 preceding siblings ...)
  2021-05-04 16:27     ` [PATCH v3 3/8] checkout-index: add " Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 5/8] parallel-checkout: add tests related to path collisions Matheus Tavares
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

Add tests to populate the working tree during clone and checkout using
sequential and parallel mode, to confirm that they produce identical
results. Also test basic checkout mechanics, such as checking for
symlinks in the leading directories and the abidance to --force.

Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But they will also be used by other
parallel-checkout tests in the following patches.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-parallel-checkout.sh          |  42 +++++
 t/t2080-parallel-checkout-basics.sh | 229 ++++++++++++++++++++++++++++
 2 files changed, 271 insertions(+)
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh

diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
new file mode 100644
index 0000000000..f60b22ef34
--- /dev/null
+++ b/t/lib-parallel-checkout.sh
@@ -0,0 +1,42 @@
+# Helpers for tests invoking parallel-checkout
+
+set_checkout_config () {
+	if test $# -ne 2
+	then
+		BUG "usage: set_checkout_config <workers> <threshold>"
+	fi &&
+
+	test_config_global checkout.workers $1 &&
+	test_config_global checkout.thresholdForParallelism $2
+}
+
+# Run "${@:2}" and check that $1 checkout workers were used
+test_checkout_workers () {
+	if test $# -lt 2
+	then
+		BUG "too few arguments to test_checkout_workers"
+	fi &&
+
+	local expected_workers=$1 &&
+	shift &&
+
+	local trace_file=trace-test-checkout-workers &&
+	rm -f "$trace_file" &&
+	GIT_TRACE2="$(pwd)/$trace_file" "$@" &&
+
+	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
+	test $workers -eq $expected_workers &&
+	rm "$trace_file"
+}
+
+# Verify that both the working tree and the index were created correctly
+verify_checkout () {
+	if test $# -ne 1
+	then
+		BUG "usage: verify_checkout <repository path>"
+	fi &&
+
+	git -C "$1" diff-index --ignore-submodules=none --exit-code HEAD -- &&
+	git -C "$1" status --porcelain >"$1".status &&
+	test_must_be_empty "$1".status
+}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
new file mode 100755
index 0000000000..7087818550
--- /dev/null
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -0,0 +1,229 @@
+#!/bin/sh
+
+test_description='parallel-checkout basics
+
+Ensure that parallel-checkout basically works on clone and checkout, spawning
+the required number of workers and correctly populating both the index and the
+working tree.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+# Test parallel-checkout with a branch switch containing a variety of file
+# creations, deletions, and modifications, involving different entry types.
+# The branches B1 and B2 have the following paths:
+#
+#      B1                 B2
+#  a/a (file)         a   (file)
+#  b   (file)         b/b (file)
+#
+#  c/c (file)         c   (symlink)
+#  d   (symlink)      d/d (file)
+#
+#  e/e (file)         e   (submodule)
+#  f   (submodule)    f/f (file)
+#
+#  g   (submodule)    g   (symlink)
+#  h   (symlink)      h   (submodule)
+#
+# Additionally, the following paths are present on both branches, but with
+# different contents:
+#
+#  i   (file)         i   (file)
+#  j   (symlink)      j   (symlink)
+#  k   (submodule)    k   (submodule)
+#
+# And the following paths are only present in one of the branches:
+#
+#  l/l (file)         -
+#  -                  m/m (file)
+#
+test_expect_success 'setup repo for checkout with various types of changes' '
+	git init sub &&
+	(
+		cd sub &&
+		git checkout -b B2 &&
+		echo B2 >file &&
+		git add file &&
+		git commit -m file &&
+
+		git checkout -b B1 &&
+		echo B1 >file &&
+		git add file &&
+		git commit -m file
+	) &&
+
+	git init various &&
+	(
+		cd various &&
+
+		git checkout -b B1 &&
+		mkdir a c e &&
+		echo a/a >a/a &&
+		echo b >b &&
+		echo c/c >c/c &&
+		test_ln_s_add c d &&
+		echo e/e >e/e &&
+		git submodule add ../sub f &&
+		git submodule add ../sub g &&
+		test_ln_s_add c h &&
+
+		echo "B1 i" >i &&
+		test_ln_s_add c j &&
+		git submodule add -b B1 ../sub k &&
+		mkdir l &&
+		echo l/l >l/l &&
+
+		git add . &&
+		git commit -m B1 &&
+
+		git checkout -b B2 &&
+		git rm -rf :^.gitmodules :^k &&
+		mkdir b d f &&
+		echo a >a &&
+		echo b/b >b/b &&
+		test_ln_s_add b c &&
+		echo d/d >d/d &&
+		git submodule add ../sub e &&
+		echo f/f >f/f &&
+		test_ln_s_add b g &&
+		git submodule add ../sub h &&
+
+		echo "B2 i" >i &&
+		test_ln_s_add b j &&
+		git -C k checkout B2 &&
+		mkdir m &&
+		echo m/m >m/m &&
+
+		git add . &&
+		git commit -m B2 &&
+
+		git checkout --recurse-submodules B1
+	)
+'
+
+for mode in sequential parallel sequential-fallback
+do
+	case $mode in
+	sequential)          workers=1 threshold=0 expected_workers=0 ;;
+	parallel)            workers=2 threshold=0 expected_workers=2 ;;
+	sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
+	esac
+
+	test_expect_success "$mode checkout" '
+		repo=various_$mode &&
+		cp -R various $repo &&
+
+		# The just copied files have more recent timestamps than their
+		# associated index entries. So refresh the cached timestamps
+		# to avoid an "entry not up-to-date" error from `git checkout`.
+		# We only have to do this for the submodules as `git checkout`
+		# will already refresh the superproject index before performing
+		# the up-to-date check.
+		#
+		git -C $repo submodule foreach "git update-index --refresh" &&
+
+		set_checkout_config $workers $threshold &&
+		test_checkout_workers $expected_workers \
+			git -C $repo checkout --recurse-submodules B2 &&
+		verify_checkout $repo
+	'
+done
+
+for mode in parallel sequential-fallback
+do
+	case $mode in
+	parallel)            workers=2 threshold=0 expected_workers=2 ;;
+	sequential-fallback) workers=2 threshold=100 expected_workers=0 ;;
+	esac
+
+	test_expect_success "$mode checkout on clone" '
+		repo=various_${mode}_clone &&
+		set_checkout_config $workers $threshold &&
+		test_checkout_workers $expected_workers \
+			git clone --recurse-submodules --branch B2 various $repo &&
+		verify_checkout $repo
+	'
+done
+
+# Just to be paranoid, actually compare the working trees' contents directly.
+test_expect_success 'compare the working trees' '
+	rm -rf various_*/.git &&
+	rm -rf various_*/*/.git &&
+
+	# We use `git diff` instead of `diff -r` because the latter would
+	# follow symlinks, and not all `diff` implementations support the
+	# `--no-dereference` option.
+	#
+	git diff --no-index various_sequential various_parallel &&
+	git diff --no-index various_sequential various_parallel_clone &&
+	git diff --no-index various_sequential various_sequential-fallback &&
+	git diff --no-index various_sequential various_sequential-fallback_clone
+'
+
+# Currently, each submodule is checked out in a separated child process, but
+# these subprocesses must also be able to use parallel checkout workers to
+# write the submodules' entries.
+test_expect_success 'submodules can use parallel checkout' '
+	set_checkout_config 2 0 &&
+	git init super &&
+	(
+		cd super &&
+		git init sub &&
+		test_commit -C sub A &&
+		test_commit -C sub B &&
+		git submodule add ./sub &&
+		git commit -m sub &&
+		rm sub/* &&
+		test_checkout_workers 2 git checkout --recurse-submodules .
+	)
+'
+
+test_expect_success 'parallel checkout respects --[no]-force' '
+	set_checkout_config 2 0 &&
+	git init dirty &&
+	(
+		cd dirty &&
+		mkdir D &&
+		test_commit D/F &&
+		test_commit F &&
+
+		rm -rf D &&
+		echo changed >D &&
+		echo changed >F.t &&
+
+		# We expect 0 workers because there is nothing to be done
+		test_checkout_workers 0 git checkout HEAD &&
+		test_path_is_file D &&
+		grep changed D &&
+		grep changed F.t &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		test_path_is_dir D &&
+		grep D/F D/F.t &&
+		grep F F.t
+	)
+'
+
+test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
+	set_checkout_config 2 0 &&
+	git init symlinks &&
+	(
+		cd symlinks &&
+		mkdir D untracked &&
+		# Commit 2 files to have enough work for 2 parallel workers
+		test_commit D/A &&
+		test_commit D/B &&
+		rm -rf D &&
+		ln -s untracked D &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		! test -h D &&
+		grep D/A D/A.t &&
+		grep D/B D/B.t
+	)
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 5/8] parallel-checkout: add tests related to path collisions
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
                       ` (3 preceding siblings ...)
  2021-05-04 16:27     ` [PATCH v3 4/8] parallel-checkout: add tests for basic operations Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 6/8] t0028: extract encoding helpers to lib-encoding.sh Matheus Tavares
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

Add tests to confirm that path collisions are properly detected by
checkout workers, both to avoid race conditions and to report colliding
entries on clone.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 parallel-checkout.c                     |   4 +
 t/lib-parallel-checkout.sh              |   4 +-
 t/t2081-parallel-checkout-collisions.sh | 162 ++++++++++++++++++++++++
 3 files changed, 168 insertions(+), 2 deletions(-)
 create mode 100755 t/t2081-parallel-checkout-collisions.sh

diff --git a/parallel-checkout.c b/parallel-checkout.c
index 09e8b10a35..6fb3f1e6c9 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -8,6 +8,7 @@
 #include "sigchain.h"
 #include "streaming.h"
 #include "thread-utils.h"
+#include "trace2.h"
 
 struct pc_worker {
 	struct child_process cp;
@@ -326,6 +327,7 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
 	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
 					   state->base_dir_len)) {
 		pc_item->status = PC_ITEM_COLLIDED;
+		trace2_data_string("pcheckout", NULL, "collision/dirname", path.buf);
 		goto out;
 	}
 
@@ -341,6 +343,8 @@ void write_pc_item(struct parallel_checkout_item *pc_item,
 			 * call should have already caught these cases.
 			 */
 			pc_item->status = PC_ITEM_COLLIDED;
+			trace2_data_string("pcheckout", NULL,
+					   "collision/basename", path.buf);
 		} else {
 			error_errno("failed to open file '%s'", path.buf);
 			pc_item->status = PC_ITEM_FAILED;
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index f60b22ef34..d6740425b1 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -22,12 +22,12 @@ test_checkout_workers () {
 
 	local trace_file=trace-test-checkout-workers &&
 	rm -f "$trace_file" &&
-	GIT_TRACE2="$(pwd)/$trace_file" "$@" &&
+	GIT_TRACE2="$(pwd)/$trace_file" "$@" 2>&8 &&
 
 	local workers=$(grep "child_start\[..*\] git checkout--worker" "$trace_file" | wc -l) &&
 	test $workers -eq $expected_workers &&
 	rm "$trace_file"
-}
+} 8>&2 2>&4
 
 # Verify that both the working tree and the index were created correctly
 verify_checkout () {
diff --git a/t/t2081-parallel-checkout-collisions.sh b/t/t2081-parallel-checkout-collisions.sh
new file mode 100755
index 0000000000..f6fcfc0c1e
--- /dev/null
+++ b/t/t2081-parallel-checkout-collisions.sh
@@ -0,0 +1,162 @@
+#!/bin/sh
+
+test_description="path collisions during parallel checkout
+
+Parallel checkout must detect path collisions to:
+
+1) Avoid racily writing to different paths that represent the same file on disk.
+2) Report the colliding entries on clone.
+
+The tests in this file exercise parallel checkout's collision detection code in
+both these mechanics.
+"
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+TEST_ROOT="$PWD"
+
+test_expect_success CASE_INSENSITIVE_FS 'setup' '
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+	cat >objs <<-EOF &&
+	100644 $empty_oid	FILE_X
+	100644 $empty_oid	FILE_x
+	100644 $empty_oid	file_X
+	100644 $empty_oid	file_x
+	EOF
+	git update-index --index-info <objs &&
+	git commit -m "colliding files" &&
+	git tag basename_collision &&
+
+	write_script "$TEST_ROOT"/logger_script <<-\EOF
+	echo "$@" >>filter.log
+	EOF
+'
+
+test_workers_in_event_trace ()
+{
+	test $1 -eq $(grep ".event.:.child_start..*checkout--worker" $2 | wc -l)
+}
+
+test_expect_success CASE_INSENSITIVE_FS 'worker detects basename collision' '
+	GIT_TRACE2_EVENT="$(pwd)/trace" git \
+		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
+		checkout . &&
+
+	test_workers_in_event_trace 2 trace &&
+	collisions=$(grep -i "category.:.pcheckout.,.key.:.collision/basename.,.value.:.file_x.}" trace | wc -l) &&
+	test $collisions -eq 3
+'
+
+test_expect_success CASE_INSENSITIVE_FS 'worker detects dirname collision' '
+	test_config filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+
+	# By setting a filter command to "a", we make it ineligible for parallel
+	# checkout, and thus it is checked out *first*. This way we can ensure
+	# that "A/B" and "A/C" will both collide with the regular file "a".
+	#
+	attr_oid=$(echo "a filter=logger" | git hash-object -w --stdin) &&
+
+	cat >objs <<-EOF &&
+	100644 $empty_oid	A/B
+	100644 $empty_oid	A/C
+	100644 $empty_oid	a
+	100644 $attr_oid	.gitattributes
+	EOF
+	git rm -rf . &&
+	git update-index --index-info <objs &&
+
+	rm -f trace filter.log &&
+	GIT_TRACE2_EVENT="$(pwd)/trace" git \
+		-c checkout.workers=2 -c checkout.thresholdForParallelism=0 \
+		checkout . &&
+
+	# Check that "a" (and only "a") was filtered
+	echo a >expected.log &&
+	test_cmp filter.log expected.log &&
+
+	# Check that it used the right number of workers and detected the collisions
+	test_workers_in_event_trace 2 trace &&
+	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/B.}" trace &&
+	grep "category.:.pcheckout.,.key.:.collision/dirname.,.value.:.A/C.}" trace
+'
+
+test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'do not follow symlinks colliding with leading dir' '
+	empty_oid=$(git hash-object -w --stdin </dev/null) &&
+	symlink_oid=$(echo "./e" | git hash-object -w --stdin) &&
+	mkdir e &&
+
+	cat >objs <<-EOF &&
+	120000 $symlink_oid	D
+	100644 $empty_oid	d/x
+	100644 $empty_oid	e/y
+	EOF
+	git rm -rf . &&
+	git update-index --index-info <objs &&
+
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git checkout . &&
+	test_path_is_dir e &&
+	test_path_is_missing e/x
+'
+
+# The two following tests check that parallel checkout correctly reports
+# colliding entries on clone. The sequential code detects a collision by
+# calling lstat() before trying to open(O_CREAT) a file. (Note that this only
+# works for clone.) Then, to find the pair of a colliding item k, it searches
+# cache_entry[0, k-1]. This is not sufficient in parallel checkout because:
+#
+# - A colliding file may be created between the lstat() and open() calls;
+# - A colliding entry might appear in the second half of the cache_entry array.
+#
+test_expect_success CASE_INSENSITIVE_FS 'collision report on clone (w/ racy file creation)' '
+	git reset --hard basename_collision &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git clone . clone-repo 2>stderr &&
+
+	grep FILE_X stderr &&
+	grep FILE_x stderr &&
+	grep file_X stderr &&
+	grep file_x stderr &&
+	grep "the following paths have collided" stderr
+'
+
+# This test ensures that the collision report code is correctly looking for
+# colliding peers in the second half of the cache_entry array. This is done by
+# defining a smudge command for the *last* array entry, which makes it
+# non-eligible for parallel-checkout. Thus, it is checked out *first*, before
+# spawning the workers.
+#
+# Note: this test doesn't work on Windows because, on this system, the
+# collision report code uses strcmp() to find the colliding pairs when
+# core.ignoreCase is false. And we need this setting for this test so that only
+# 'file_x' matches the pattern of the filter attribute. But the test works on
+# OSX, where the colliding pairs are found using inode.
+#
+test_expect_success CASE_INSENSITIVE_FS,!MINGW,!CYGWIN \
+	'collision report on clone (w/ colliding peer after the detected entry)' '
+
+	test_config_global filter.logger.smudge "\"$TEST_ROOT/logger_script\" %f" &&
+	git reset --hard basename_collision &&
+	echo "file_x filter=logger" >.gitattributes &&
+	git add .gitattributes &&
+	git commit -m "filter for file_x" &&
+
+	rm -rf clone-repo &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 \
+		git -c core.ignoreCase=false clone . clone-repo 2>stderr &&
+
+	grep FILE_X stderr &&
+	grep FILE_x stderr &&
+	grep file_X stderr &&
+	grep file_x stderr &&
+	grep "the following paths have collided" stderr &&
+
+	# Check that only "file_x" was filtered
+	echo file_x >expected.log &&
+	test_cmp clone-repo/filter.log expected.log
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 6/8] t0028: extract encoding helpers to lib-encoding.sh
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
                       ` (4 preceding siblings ...)
  2021-05-04 16:27     ` [PATCH v3 5/8] parallel-checkout: add tests related to path collisions Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 7/8] parallel-checkout: add tests related to .gitattributes Matheus Tavares
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

The following patch will add tests outside t0028 which will also need to
re-encode some strings. Extract the auxiliary encoding functions from
t0028 to a common lib file so that they can be reused.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-encoding.sh                | 25 +++++++++++++++++++++++++
 t/t0028-working-tree-encoding.sh | 25 +------------------------
 2 files changed, 26 insertions(+), 24 deletions(-)
 create mode 100644 t/lib-encoding.sh

diff --git a/t/lib-encoding.sh b/t/lib-encoding.sh
new file mode 100644
index 0000000000..2dabc8c73e
--- /dev/null
+++ b/t/lib-encoding.sh
@@ -0,0 +1,25 @@
+# Encoding helpers
+
+test_lazy_prereq NO_UTF16_BOM '
+	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
+'
+
+test_lazy_prereq NO_UTF32_BOM '
+	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
+'
+
+write_utf16 () {
+	if test_have_prereq NO_UTF16_BOM
+	then
+		printf '\376\377'
+	fi &&
+	iconv -f UTF-8 -t UTF-16
+}
+
+write_utf32 () {
+	if test_have_prereq NO_UTF32_BOM
+	then
+		printf '\0\0\376\377'
+	fi &&
+	iconv -f UTF-8 -t UTF-32
+}
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index f970a9806b..82905a2156 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -6,33 +6,10 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY/lib-encoding.sh"
 
 GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
 
-test_lazy_prereq NO_UTF16_BOM '
-	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
-'
-
-test_lazy_prereq NO_UTF32_BOM '
-	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
-'
-
-write_utf16 () {
-	if test_have_prereq NO_UTF16_BOM
-	then
-		printf '\376\377'
-	fi &&
-	iconv -f UTF-8 -t UTF-16
-}
-
-write_utf32 () {
-	if test_have_prereq NO_UTF32_BOM
-	then
-		printf '\0\0\376\377'
-	fi &&
-	iconv -f UTF-8 -t UTF-32
-}
-
 test_expect_success 'setup test files' '
 	git config core.eol lf &&
 
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 7/8] parallel-checkout: add tests related to .gitattributes
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
                       ` (5 preceding siblings ...)
  2021-05-04 16:27     ` [PATCH v3 6/8] t0028: extract encoding helpers to lib-encoding.sh Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-04 16:27     ` [PATCH v3 8/8] ci: run test round with parallel-checkout enabled Matheus Tavares
  2021-05-05 13:57     ` [PATCH v3 0/8] Parallel Checkout (part 3) Derrick Stolee
  8 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

Add tests to confirm that the `struct conv_attrs` data is correctly
passed from the main process to the workers, and that they can properly
convert the blobs before writing them to the working tree.

Also check that parallel-ineligible entries, such as regular files that
require external filters, are correctly smudge and written when
parallel-checkout is enabled.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t2082-parallel-checkout-attributes.sh | 194 ++++++++++++++++++++++++
 1 file changed, 194 insertions(+)
 create mode 100755 t/t2082-parallel-checkout-attributes.sh

diff --git a/t/t2082-parallel-checkout-attributes.sh b/t/t2082-parallel-checkout-attributes.sh
new file mode 100755
index 0000000000..2525457961
--- /dev/null
+++ b/t/t2082-parallel-checkout-attributes.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+
+test_description='parallel-checkout: attributes
+
+Verify that parallel-checkout correctly creates files that require
+conversions, as specified in .gitattributes. The main point here is
+to check that the conv_attr data is correctly sent to the workers
+and that it contains sufficient information to smudge files
+properly (without access to the index or attribute stack).
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+. "$TEST_DIRECTORY/lib-encoding.sh"
+
+test_expect_success 'parallel-checkout with ident' '
+	set_checkout_config 2 0 &&
+	git init ident &&
+	(
+		cd ident &&
+		echo "A ident" >.gitattributes &&
+		echo "\$Id\$" >A &&
+		echo "\$Id\$" >B &&
+		git add -A &&
+		git commit -m id &&
+
+		rm A B &&
+		test_checkout_workers 2 git reset --hard &&
+		hexsz=$(test_oid hexsz) &&
+		grep -E "\\\$Id: [0-9a-f]{$hexsz} \\\$" A &&
+		grep "\\\$Id\\\$" B
+	)
+'
+
+test_expect_success 'parallel-checkout with re-encoding' '
+	set_checkout_config 2 0 &&
+	git init encoding &&
+	(
+		cd encoding &&
+		echo text >utf8-text &&
+		write_utf16 <utf8-text >utf16-text &&
+
+		echo "A working-tree-encoding=UTF-16" >.gitattributes &&
+		cp utf16-text A &&
+		cp utf8-text B &&
+		git add A B .gitattributes &&
+		git commit -m encoding &&
+
+		# Check that A is stored in UTF-8
+		git cat-file -p :A >A.internal &&
+		test_cmp_bin utf8-text A.internal &&
+
+		rm A B &&
+		test_checkout_workers 2 git checkout A B &&
+
+		# Check that A (and only A) is re-encoded during checkout
+		test_cmp_bin utf16-text A &&
+		test_cmp_bin utf8-text B
+	)
+'
+
+test_expect_success 'parallel-checkout with eol conversions' '
+	set_checkout_config 2 0 &&
+	git init eol &&
+	(
+		cd eol &&
+		printf "multi\r\nline\r\ntext" >crlf-text &&
+		printf "multi\nline\ntext" >lf-text &&
+
+		git config core.autocrlf false &&
+		echo "A eol=crlf" >.gitattributes &&
+		cp crlf-text A &&
+		cp lf-text B &&
+		git add A B .gitattributes &&
+		git commit -m eol &&
+
+		# Check that A is stored with LF format
+		git cat-file -p :A >A.internal &&
+		test_cmp_bin lf-text A.internal &&
+
+		rm A B &&
+		test_checkout_workers 2 git checkout A B &&
+
+		# Check that A (and only A) is converted to CRLF during checkout
+		test_cmp_bin crlf-text A &&
+		test_cmp_bin lf-text B
+	)
+'
+
+# Entries that require an external filter are not eligible for parallel
+# checkout. Check that both the parallel-eligible and non-eligible entries are
+# properly writen in a single checkout operation.
+#
+test_expect_success 'parallel-checkout and external filter' '
+	set_checkout_config 2 0 &&
+	git init filter &&
+	(
+		cd filter &&
+		write_script <<-\EOF rot13.sh &&
+		tr \
+		  "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" \
+		  "nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM"
+		EOF
+
+		git config filter.rot13.clean "\"$(pwd)/rot13.sh\"" &&
+		git config filter.rot13.smudge "\"$(pwd)/rot13.sh\"" &&
+		git config filter.rot13.required true &&
+
+		echo abcd >original &&
+		echo nopq >rot13 &&
+
+		echo "A filter=rot13" >.gitattributes &&
+		cp original A &&
+		cp original B &&
+		cp original C &&
+		git add A B C .gitattributes &&
+		git commit -m filter &&
+
+		# Check that A (and only A) was cleaned
+		git cat-file -p :A >A.internal &&
+		test_cmp rot13 A.internal &&
+		git cat-file -p :B >B.internal &&
+		test_cmp original B.internal &&
+		git cat-file -p :C >C.internal &&
+		test_cmp original C.internal &&
+
+		rm A B C *.internal &&
+		test_checkout_workers 2 git checkout A B C &&
+
+		# Check that A (and only A) was smudged during checkout
+		test_cmp original A &&
+		test_cmp original B &&
+		test_cmp original C
+	)
+'
+
+# The delayed queue is independent from the parallel queue, and they should be
+# able to work together in the same checkout process.
+#
+test_expect_success PERL 'parallel-checkout and delayed checkout' '
+	write_script rot13-filter.pl "$PERL_PATH" \
+		<"$TEST_DIRECTORY"/t0021/rot13-filter.pl &&
+
+	test_config_global filter.delay.process \
+		"\"$(pwd)/rot13-filter.pl\" --always-delay \"$(pwd)/delayed.log\" clean smudge delay" &&
+	test_config_global filter.delay.required true &&
+
+	echo "abcd" >original &&
+	echo "nopq" >rot13 &&
+
+	git init delayed &&
+	(
+		cd delayed &&
+		echo "*.d filter=delay" >.gitattributes &&
+		cp ../original W.d &&
+		cp ../original X.d &&
+		cp ../original Y &&
+		cp ../original Z &&
+		git add -A &&
+		git commit -m delayed &&
+
+		# Check that *.d files were cleaned
+		git cat-file -p :W.d >W.d.internal &&
+		test_cmp W.d.internal ../rot13 &&
+		git cat-file -p :X.d >X.d.internal &&
+		test_cmp X.d.internal ../rot13 &&
+		git cat-file -p :Y >Y.internal &&
+		test_cmp Y.internal ../original &&
+		git cat-file -p :Z >Z.internal &&
+		test_cmp Z.internal ../original &&
+
+		rm *
+	) &&
+
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 git -C delayed checkout -f &&
+	verify_checkout delayed &&
+
+	# Check that the *.d files got to the delay queue and were filtered
+	grep "smudge W.d .* \[DELAYED\]" delayed.log &&
+	grep "smudge X.d .* \[DELAYED\]" delayed.log &&
+	test_cmp delayed/W.d original &&
+	test_cmp delayed/X.d original &&
+
+	# Check that the parallel-eligible entries went to the right queue and
+	# were not filtered
+	! grep "smudge Y .* \[DELAYED\]" delayed.log &&
+	! grep "smudge Z .* \[DELAYED\]" delayed.log &&
+	test_cmp delayed/Y original &&
+	test_cmp delayed/Z original
+'
+
+test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 8/8] ci: run test round with parallel-checkout enabled
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
                       ` (6 preceding siblings ...)
  2021-05-04 16:27     ` [PATCH v3 7/8] parallel-checkout: add tests related to .gitattributes Matheus Tavares
@ 2021-05-04 16:27     ` Matheus Tavares
  2021-05-05 13:57     ` [PATCH v3 0/8] Parallel Checkout (part 3) Derrick Stolee
  8 siblings, 0 replies; 47+ messages in thread
From: Matheus Tavares @ 2021-05-04 16:27 UTC (permalink / raw)
  To: git; +Cc: christian.couder, git, stolee, tboegi

We already have tests for the basic parallel-checkout operations. But
this code can also run be executed by other commands, such as
git-read-tree and git-sparse-checkout, which are currently not tested
with multiple workers. To promote a wider test coverage without
duplicating tests:

1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally
   force parallel-checkout execution during the whole test suite.

2. Set this variable (with a value of 2) in the second test round of our
   linux-gcc CI job. This round runs `make test` again with some
   optional GIT_TEST_* variables enabled, so there is no additional
   overhead in exercising the parallel-checkout code here.

Note that tests checking out less than two parallel-eligible entries
will fall back to the sequential mode. Nevertheless, it's still a good
exercise for the parallel-checkout framework as the fallback codepath
also writes the queued entries using the parallel-checkout functions
(only without spawning any worker).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 ci/run-build-and-tests.sh  |  1 +
 parallel-checkout.c        | 14 ++++++++++++++
 t/README                   |  4 ++++
 t/lib-parallel-checkout.sh |  3 +++
 4 files changed, 22 insertions(+)

diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index d19be40544..3ce81ffee9 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -26,6 +26,7 @@ linux-gcc)
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
+	export GIT_TEST_CHECKOUT_WORKERS=2
 	make test
 	;;
 linux-clang)
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 6fb3f1e6c9..6b1af32bb3 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -35,6 +35,20 @@ static const int DEFAULT_NUM_WORKERS = 1;
 
 void get_parallel_checkout_configs(int *num_workers, int *threshold)
 {
+	char *env_workers = getenv("GIT_TEST_CHECKOUT_WORKERS");
+
+	if (env_workers && *env_workers) {
+		if (strtol_i(env_workers, 10, num_workers)) {
+			die("invalid value for GIT_TEST_CHECKOUT_WORKERS: '%s'",
+			    env_workers);
+		}
+		if (*num_workers < 1)
+			*num_workers = online_cpus();
+
+		*threshold = 0;
+		return;
+	}
+
 	if (git_config_get_int("checkout.workers", num_workers))
 		*num_workers = DEFAULT_NUM_WORKERS;
 	else if (*num_workers < 1)
diff --git a/t/README b/t/README
index 8eb9e46b1d..a8cfd37387 100644
--- a/t/README
+++ b/t/README
@@ -439,6 +439,10 @@ GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
 GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the
 sparse-index format by default.
 
+GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
+to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
+execution of the parallel-checkout code.
+
 Naming Tests
 ------------
 
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
index d6740425b1..21f5759732 100644
--- a/t/lib-parallel-checkout.sh
+++ b/t/lib-parallel-checkout.sh
@@ -1,5 +1,8 @@
 # Helpers for tests invoking parallel-checkout
 
+# Parallel checkout tests need full control of the number of workers
+unset GIT_TEST_CHECKOUT_WORKERS
+
 set_checkout_config () {
 	if test $# -ne 2
 	then
-- 
2.30.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/8] builtin/checkout.c: complete parallel checkout support
  2021-05-04 16:27     ` [PATCH v3 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
@ 2021-05-05 13:55       ` Derrick Stolee
  0 siblings, 0 replies; 47+ messages in thread
From: Derrick Stolee @ 2021-05-05 13:55 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: christian.couder, git, tboegi

On 5/4/2021 12:27 PM, Matheus Tavares wrote:> @@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts,
>  	int nr_checkouts = 0, nr_unmerged = 0;
>  	int errs = 0;
>  	int pos;
> +	int pc_workers, pc_threshold;
> +	struct mem_pool ce_mem_pool;
>  
>  	state.force = 1;
>  	state.refresh_cache = 1;
>  	state.istate = &the_index;
>  
> +	mem_pool_init(&ce_mem_pool, 0);
> +	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
>  	init_checkout_metadata(&state.meta, info->refname,
>  			       info->commit ? &info->commit->object.oid : &info->oid,
>  			       NULL);
>  
>  	enable_delayed_checkout(&state);
> +	if (pc_workers > 1)
> +		init_parallel_checkout();
>  
>  	/* TODO: audit for interaction with sparse-index. */
>  	ensure_full_index(&the_index);

Since this context piece is new as of your rebase, I did a quick check on
all of the calls you inserted above and found them to be safe with the
sparse-index. They do not care about the number of cache entries, for
example.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 0/8] Parallel Checkout (part 3)
  2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
                       ` (7 preceding siblings ...)
  2021-05-04 16:27     ` [PATCH v3 8/8] ci: run test round with parallel-checkout enabled Matheus Tavares
@ 2021-05-05 13:57     ` Derrick Stolee
  2021-05-06  0:40       ` Junio C Hamano
  8 siblings, 1 reply; 47+ messages in thread
From: Derrick Stolee @ 2021-05-05 13:57 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: christian.couder, git, tboegi

On 5/4/2021 12:27 PM, Matheus Tavares wrote:
> This is the last part of the parallel checkout series. It adds tests and
> parallel checkout support to `git checkout-index` and
> `git checkout <pathspec>`.
> 
> I rebased this version onto `master`, as part-2 was merged down to
> `master`.

I read the range-diff and gave the patches another pass. This version
looks good to me.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 0/8] Parallel Checkout (part 3)
  2021-05-05 13:57     ` [PATCH v3 0/8] Parallel Checkout (part 3) Derrick Stolee
@ 2021-05-06  0:40       ` Junio C Hamano
  0 siblings, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2021-05-06  0:40 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Matheus Tavares, git, christian.couder, git, tboegi

Derrick Stolee <stolee@gmail.com> writes:

> On 5/4/2021 12:27 PM, Matheus Tavares wrote:
>> This is the last part of the parallel checkout series. It adds tests and
>> parallel checkout support to `git checkout-index` and
>> `git checkout <pathspec>`.
>> 
>> I rebased this version onto `master`, as part-2 was merged down to
>> `master`.
>
> I read the range-diff and gave the patches another pass. This version
> looks good to me.

Thanks.  Your assessment matches mine.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-05-06  0:40 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22 15:17 [PATCH 0/7] Parallel Checkout (part 3) Matheus Tavares
2021-04-22 15:17 ` [PATCH 1/7] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2021-04-22 15:17 ` [PATCH 2/7] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2021-04-23 16:19   ` Derrick Stolee
2021-04-26 21:54     ` Matheus Tavares Bernardino
2021-04-22 15:17 ` [PATCH 3/7] checkout-index: add " Matheus Tavares
2021-04-23 18:32   ` Derrick Stolee
2021-04-26 22:30     ` Matheus Tavares Bernardino
2021-04-22 15:17 ` [PATCH 4/7] parallel-checkout: add tests for basic operations Matheus Tavares
2021-04-23 19:18   ` Derrick Stolee
2021-04-27  2:30     ` Matheus Tavares Bernardino
2021-04-22 15:17 ` [PATCH 5/7] parallel-checkout: add tests related to path collisions Matheus Tavares
2021-04-22 15:17 ` [PATCH 6/7] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2021-04-23 19:48   ` Derrick Stolee
2021-04-22 15:17 ` [PATCH 7/7] ci: run test round with parallel-checkout enabled Matheus Tavares
2021-04-23 19:56   ` Derrick Stolee
2021-04-30 21:40 ` [PATCH v2 0/8] Parallel Checkout (part 3) Matheus Tavares
2021-04-30 21:40   ` [PATCH v2 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2021-05-01 17:06     ` Christian Couder
2021-05-03 14:11       ` Matheus Tavares Bernardino
2021-04-30 21:40   ` [PATCH v2 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2021-05-01 17:08     ` Christian Couder
2021-05-03 14:21       ` Matheus Tavares Bernardino
2021-04-30 21:40   ` [PATCH v2 3/8] checkout-index: add " Matheus Tavares
2021-05-01 17:08     ` Christian Couder
2021-05-03 14:22       ` Matheus Tavares Bernardino
2021-04-30 21:40   ` [PATCH v2 4/8] parallel-checkout: add tests for basic operations Matheus Tavares
2021-04-30 21:40   ` [PATCH v2 5/8] parallel-checkout: add tests related to path collisions Matheus Tavares
2021-05-02  7:59     ` Torsten Bögershausen
2021-05-03 14:58       ` Matheus Tavares Bernardino
2021-04-30 21:40   ` [PATCH v2 6/8] t0028: extract encoding helpers to lib-encoding.sh Matheus Tavares
2021-04-30 21:40   ` [PATCH v2 7/8] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2021-04-30 21:40   ` [PATCH v2 8/8] ci: run test round with parallel-checkout enabled Matheus Tavares
2021-05-02 10:12   ` [PATCH v2 0/8] Parallel Checkout (part 3) Torsten Bögershausen
2021-05-03 15:01     ` Matheus Tavares Bernardino
2021-05-04 16:27   ` [PATCH v3 " Matheus Tavares
2021-05-04 16:27     ` [PATCH v3 1/8] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2021-05-04 16:27     ` [PATCH v3 2/8] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2021-05-05 13:55       ` Derrick Stolee
2021-05-04 16:27     ` [PATCH v3 3/8] checkout-index: add " Matheus Tavares
2021-05-04 16:27     ` [PATCH v3 4/8] parallel-checkout: add tests for basic operations Matheus Tavares
2021-05-04 16:27     ` [PATCH v3 5/8] parallel-checkout: add tests related to path collisions Matheus Tavares
2021-05-04 16:27     ` [PATCH v3 6/8] t0028: extract encoding helpers to lib-encoding.sh Matheus Tavares
2021-05-04 16:27     ` [PATCH v3 7/8] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2021-05-04 16:27     ` [PATCH v3 8/8] ci: run test round with parallel-checkout enabled Matheus Tavares
2021-05-05 13:57     ` [PATCH v3 0/8] Parallel Checkout (part 3) Derrick Stolee
2021-05-06  0:40       ` Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git