git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
@ 2022-11-08 22:44 Victoria Dye via GitGitGadget
  2022-11-08 22:44 ` [PATCH 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-08 22:44 UTC (permalink / raw)
  To: git; +Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye

Following up on a discussion [1] around cache tree refreshes in 'git reset',
this series updates callers of 'unpack_trees()' to skip its internal
invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
expensive operation, and it is redundant when 'prime_cache_tree()' clears
and rebuilds the cache tree from scratch immediately after.

The first patch adds a test directly comparing the execution time of
'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
that on a fully-valid cache tree, they perform the same, but on a
fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
(although both are so fast that the total execution time of 100 invocations
is needed to compare the results in the default perf repo).

The second patch introduces the 'skip_cache_tree_update' option for
'unpack_trees()', but does not use it yet.

The remaining three patches update callers that make the aforementioned
redundant cache tree updates. The performance impact on these callers ranges
from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
details can be found in the commit messages of the patch associated with the
affected code path.

Thanks!

 * Victoria

[1] https://lore.kernel.org/git/xmqqlf30edvf.fsf@gitster.g/ [2]
https://lore.kernel.org/git/f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com/

Victoria Dye (5):
  cache-tree: add perf test comparing update and prime
  unpack-trees: add 'skip_cache_tree_update' option
  reset: use 'skip_cache_tree_update' option
  read-tree: use 'skip_cache_tree_update' option
  rebase: use 'skip_cache_tree_update' option

 Makefile                           |  1 +
 builtin/read-tree.c                |  4 +++
 builtin/reset.c                    |  2 ++
 reset.c                            |  1 +
 sequencer.c                        |  1 +
 t/helper/test-cache-tree.c         | 52 ++++++++++++++++++++++++++++++
 t/helper/test-tool.c               |  1 +
 t/helper/test-tool.h               |  1 +
 t/perf/p0006-read-tree-checkout.sh |  8 +++++
 t/perf/p0090-cache-tree.sh         | 27 ++++++++++++++++
 t/perf/p7102-reset.sh              | 21 ++++++++++++
 t/t1022-read-tree-partial-clone.sh |  2 +-
 unpack-trees.c                     |  3 +-
 unpack-trees.h                     |  3 +-
 14 files changed, 124 insertions(+), 3 deletions(-)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh
 create mode 100755 t/perf/p7102-reset.sh


base-commit: 3b08839926fcc7cc48cf4c759737c1a71af430c1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1411%2Fvdye%2Ffeature%2Fcache-tree-optimization-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1411/vdye/feature/cache-tree-optimization-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1411
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/5] cache-tree: add perf test comparing update and prime
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
@ 2022-11-08 22:44 ` Victoria Dye via GitGitGadget
  2022-11-10  7:23   ` SZEDER Gábor
  2022-11-08 22:44 ` [PATCH 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-08 22:44 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a performance test comparing the execution times of 'prime_cache_tree()'
and 'cache_tree_update(_, WRITE_TREE_SILENT | WRITE_TREE_REPAIR)'. The goal
of comparing these two is to identify which is the faster method for
rebuilding an invalid cache tree, ultimately to remove one when both are
(reundantly) called in immediate succession.

Both methods are incredibly fast, so the new tests in 'p0090-cache-tree.sh'
must call the tested method many times in succession to get non-negligible
timing results. Results show a substantial difference in execution time
between the two, with 'prime_cache_tree()' appearing to be the overall
faster method:

Test                                 this tree
----------------------------------------------------
0090.1: prime_cache_tree, clean      0.07(0.05+0.01)
0090.2: cache_tree_update, clean     0.11(0.05+0.06)
0090.3: prime_cache_tree, invalid    0.06(0.05+0.01)
0090.4: cache_tree_update, invalid   0.50(0.41+0.07)

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Makefile                   |  1 +
 t/helper/test-cache-tree.c | 52 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c       |  1 +
 t/helper/test-tool.h       |  1 +
 t/perf/p0090-cache-tree.sh | 27 ++++++++++++++++++++
 5 files changed, 82 insertions(+)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh

diff --git a/Makefile b/Makefile
index 4927379184c..3639c7c2a94 100644
--- a/Makefile
+++ b/Makefile
@@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
 TEST_BUILTINS_OBJS += test-bundle-uri.o
+TEST_BUILTINS_OBJS += test-cache-tree.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/t/helper/test-cache-tree.c b/t/helper/test-cache-tree.c
new file mode 100644
index 00000000000..2fad6d06d30
--- /dev/null
+++ b/t/helper/test-cache-tree.c
@@ -0,0 +1,52 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "tree.h"
+#include "cache-tree.h"
+#include "parse-options.h"
+
+static char const * const test_cache_tree_usage[] = {
+	N_("test-tool cache-tree <options> (prime|repair)"),
+	NULL
+};
+
+int cmd__cache_tree(int argc, const char **argv)
+{
+	struct object_id oid;
+	struct tree *tree;
+	int fresh = 0;
+	int count = 1;
+	int i;
+
+	struct option options[] = {
+		OPT_BOOL(0, "fresh", &fresh,
+			 N_("clear the cache tree before each repetition")),
+		OPT_INTEGER_F(0, "count", &count, N_("number of times to repeat the operation"),
+			      PARSE_OPT_NONEG),
+		OPT_END()
+	};
+
+	setup_git_directory();
+
+	parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);
+
+	if (read_cache() < 0)
+		die("unable to read index file");
+
+	get_oid("HEAD", &oid);
+	tree = parse_tree_indirect(&oid);
+	for (i = 0; i < count; i++) {
+		if (fresh)
+			cache_tree_free(&the_index.cache_tree);
+
+		if (!argc)
+			die("Must specify subcommand");
+		else if (!strcmp(argv[0], "prime"))
+			prime_cache_tree(the_repository, &the_index, tree);
+		else if (!strcmp(argv[0], "update"))
+			cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
+		else
+			die("Unknown command %s", argv[0]);
+	}
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 01cda9358df..547a3be1c8b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -14,6 +14,7 @@ static struct test_cmd cmds[] = {
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
 	{ "bundle-uri", cmd__bundle_uri },
+	{ "cache-tree", cmd__cache_tree },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ca2948066fd..e44e1d896d3 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -8,6 +8,7 @@ int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
 int cmd__bundle_uri(int argc, const char **argv);
+int cmd__cache_tree(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/perf/p0090-cache-tree.sh b/t/perf/p0090-cache-tree.sh
new file mode 100755
index 00000000000..91c13e28a27
--- /dev/null
+++ b/t/perf/p0090-cache-tree.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+test_description="Tests performance of cache tree operations"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+count=200
+test_perf "prime_cache_tree, clean" "
+	test-tool cache-tree --count $count prime
+"
+
+test_perf "cache_tree_update, clean" "
+	test-tool cache-tree --count $count update
+"
+
+test_perf "prime_cache_tree, invalid" "
+	test-tool cache-tree --count $count --fresh prime
+"
+
+test_perf "cache_tree_update, invalid" "
+	test-tool cache-tree --count $count --fresh update
+"
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/5] unpack-trees: add 'skip_cache_tree_update' option
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
  2022-11-08 22:44 ` [PATCH 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
@ 2022-11-08 22:44 ` Victoria Dye via GitGitGadget
  2022-11-08 22:44 ` [PATCH 3/5] reset: use " Victoria Dye via GitGitGadget
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-08 22:44 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add (disabled by default) option to skip the 'cache_tree_update()' at the
end of 'unpack_trees()'. In many cases, this cache tree update is redundant
because the caller of 'unpack_trees()' immediately follows it with
'prime_cache_tree()', rebuilding the entire cache tree from scratch. While
these operations aren't the most expensive part of operations like 'git
reset', the duplicate calls still create a minor unnecessary slowdown.

Introduce an option for callers to skip the 'cache_tree_update()' in
'unpack_trees()' if it is redundant (that is, if 'prime_cache_tree()' is
called afterwards). At the moment, no 'unpack_trees()' callers use the new
option; they will be updated in subsequent patches.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 3 ++-
 unpack-trees.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index bae812156c4..8a762aa0772 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2043,7 +2043,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		if (!ret) {
 			if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
 				cache_tree_verify(the_repository, &o->result);
-			if (!cache_tree_fully_valid(o->result.cache_tree))
+			if (!o->skip_cache_tree_update &&
+			    !cache_tree_fully_valid(o->result.cache_tree))
 				cache_tree_update(&o->result,
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
diff --git a/unpack-trees.h b/unpack-trees.h
index efb9edfbb27..6ab0d74c84d 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -71,7 +71,8 @@ struct unpack_trees_options {
 		     quiet,
 		     exiting_early,
 		     show_all_errors,
-		     dry_run;
+		     dry_run,
+		     skip_cache_tree_update;
 	enum unpack_trees_reset_type reset;
 	const char *prefix;
 	int cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/5] reset: use 'skip_cache_tree_update' option
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
  2022-11-08 22:44 ` [PATCH 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
  2022-11-08 22:44 ` [PATCH 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
@ 2022-11-08 22:44 ` Victoria Dye via GitGitGadget
  2022-11-08 22:44 ` [PATCH 4/5] read-tree: " Victoria Dye via GitGitGadget
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-08 22:44 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Enable the 'skip_cache_tree_update' option in the variants that call
'prime_cache_tree()' after 'unpack_trees()' (specifically, 'git reset
--mixed' and 'git reset --hard'). This avoids redundantly rebuilding the
cache tree in both 'cache_tree_update()' at the end of 'unpack_trees()' and
in 'prime_cache_tree()', resulting in a small (but consistent) performance
improvement. From the newly-added 'p7102-reset.sh' test:

Test                         before            after
--------------------------------------------------------------------
7102.1: reset --hard (...)   2.11(0.40+1.54)   1.97(0.38+1.47) -6.6%

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c       |  2 ++
 t/perf/p7102-reset.sh | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)
 create mode 100755 t/perf/p7102-reset.sh

diff --git a/builtin/reset.c b/builtin/reset.c
index fdce6f8c856..ab027774824 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -73,9 +73,11 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	case HARD:
 		opts.update = 1;
 		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+		opts.skip_cache_tree_update = 1;
 		break;
 	case MIXED:
 		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+		opts.skip_cache_tree_update = 1;
 		/* but opts.update=0, so working tree not updated */
 		break;
 	default:
diff --git a/t/perf/p7102-reset.sh b/t/perf/p7102-reset.sh
new file mode 100755
index 00000000000..9b039e8691f
--- /dev/null
+++ b/t/perf/p7102-reset.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+test_description='performance of reset'
+. ./perf-lib.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+test_perf 'reset --hard with change in tree' '
+	base=$(git rev-parse HEAD) &&
+	test_commit --no-tag A &&
+	new=$(git rev-parse HEAD) &&
+
+	for i in $(test_seq 10)
+	do
+		git reset --hard $new &&
+		git reset --hard $base || return $?
+	done
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/5] read-tree: use 'skip_cache_tree_update' option
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-11-08 22:44 ` [PATCH 3/5] reset: use " Victoria Dye via GitGitGadget
@ 2022-11-08 22:44 ` Victoria Dye via GitGitGadget
  2022-11-08 22:44 ` [PATCH 5/5] rebase: " Victoria Dye via GitGitGadget
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-08 22:44 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

When running 'read-tree' with a single tree and no prefix,
'prime_cache_tree()' is called after the tree is unpacked. In that
situation, skip a redundant call to 'cache_tree_update()' in
'unpack_trees()' by enabling the 'skip_cache_tree_update' unpack option.

Removing the redundant cache tree update provides a substantial performance
improvement to 'git read-tree <tree-ish>', as shown by a test added to
'p0006-read-tree-checkout.sh':

Test                          before            after
----------------------------------------------------------------------
read-tree br_ballast_plus_1   3.94(1.80+1.57)   3.00(1.14+1.28) -23.9%

Note that the 'read-tree' in 't1022-read-tree-partial-clone.sh' is updated
to read two trees, rather than one. The test was first introduced in
d3da223f221 (cache-tree: prefetch in partial clone read-tree, 2021-07-23) to
exercise the 'cache_tree_update()' code path, as used in 'git merge'. Since
this patch drops the call to 'cache_tree_update()' in single-tree 'git
read-tree', change the test to use the two-tree variant so that
'cache_tree_update()' is called as intended.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/read-tree.c                | 4 ++++
 t/perf/p0006-read-tree-checkout.sh | 8 ++++++++
 t/t1022-read-tree-partial-clone.sh | 2 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index f4cbe460b97..45c6652444b 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -249,6 +249,10 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (opts.debug_unpack)
 		opts.fn = debug_merge;
 
+	/* If we're going to prime_cache_tree later, skip cache tree update */
+	if (nr_trees == 1 && !opts.prefix)
+		opts.skip_cache_tree_update = 1;
+
 	cache_tree_free(&active_cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
diff --git a/t/perf/p0006-read-tree-checkout.sh b/t/perf/p0006-read-tree-checkout.sh
index c481c012d2f..325566e18eb 100755
--- a/t/perf/p0006-read-tree-checkout.sh
+++ b/t/perf/p0006-read-tree-checkout.sh
@@ -49,6 +49,14 @@ test_perf "read-tree br_base br_ballast ($nr_files)" '
 	git read-tree -n -m br_base br_ballast
 '
 
+test_perf "read-tree br_ballast_plus_1 ($nr_files)" '
+	# Run read-tree 100 times for clearer performance results & comparisons
+	for i in  $(test_seq 100)
+	do
+		git read-tree -n -m br_ballast_plus_1 || return 1
+	done
+'
+
 test_perf "switch between br_base br_ballast ($nr_files)" '
 	git checkout -q br_base &&
 	git checkout -q br_ballast
diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
index a9953b6a71c..da539716359 100755
--- a/t/t1022-read-tree-partial-clone.sh
+++ b/t/t1022-read-tree-partial-clone.sh
@@ -19,7 +19,7 @@ test_expect_success 'read-tree in partial clone prefetches in one batch' '
 	git -C server config uploadpack.allowfilter 1 &&
 	git -C server config uploadpack.allowanysha1inwant 1 &&
 	git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
-	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE $TREE &&
 
 	# "done" marks the end of negotiation (once per fetch). Expect that
 	# only one fetch occurs.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/5] rebase: use 'skip_cache_tree_update' option
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-11-08 22:44 ` [PATCH 4/5] read-tree: " Victoria Dye via GitGitGadget
@ 2022-11-08 22:44 ` Victoria Dye via GitGitGadget
  2022-11-09 15:23 ` [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Derrick Stolee
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-08 22:44 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Enable the 'skip_cache_tree_update' option in both 'do_reset()'
('sequencer.c') and 'reset_head()' ('reset.c'). Both of these callers invoke
'prime_cache_tree()' after 'unpack_trees()', so we can remove an unnecessary
cache tree rebuild by skipping 'cache_tree_update()'.

When testing with 'p3400-rebase.sh' and 'p3404-rebase-interactive.sh', the
performance change of this update was negligible, likely due to the
operation being dominated by more expensive operations (like checking out
trees). However, since the change doesn't harm performance, it's worth
keeping this 'unpack_trees()' usage consistent with others that subsequently
invoke 'prime_cache_tree()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 reset.c     | 1 +
 sequencer.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/reset.c b/reset.c
index e3383a93343..5ded23611f3 100644
--- a/reset.c
+++ b/reset.c
@@ -128,6 +128,7 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	unpack_tree_opts.skip_cache_tree_update = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (reset_hard)
 		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
diff --git a/sequencer.c b/sequencer.c
index e658df7e8ff..3f7a73ce4e1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3750,6 +3750,7 @@ static int do_reset(struct repository *r,
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	unpack_tree_opts.skip_cache_tree_update = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
 
 	if (repo_read_index_unmerged(r)) {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-11-08 22:44 ` [PATCH 5/5] rebase: " Victoria Dye via GitGitGadget
@ 2022-11-09 15:23 ` Derrick Stolee
  2022-11-09 22:18   ` Victoria Dye
  2022-11-09 23:01 ` Taylor Blau
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
  7 siblings, 1 reply; 31+ messages in thread
From: Derrick Stolee @ 2022-11-09 15:23 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: gitster, phillip.wood123, jonathantanmy, Victoria Dye

On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
> Following up on a discussion [1] around cache tree refreshes in 'git reset',
> this series updates callers of 'unpack_trees()' to skip its internal
> invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
> immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
> expensive operation, and it is redundant when 'prime_cache_tree()' clears
> and rebuilds the cache tree from scratch immediately after.
> 
> The first patch adds a test directly comparing the execution time of
> 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
> that on a fully-valid cache tree, they perform the same, but on a
> fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
> (although both are so fast that the total execution time of 100 invocations
> is needed to compare the results in the default perf repo).

One thing I found interesting is how you needed 200 iterations to show
a meaningful change in this test script, but in the case of 'git reset'
we can see sizeable improvements even with a single iteration.

Is there something about this test that is artificially speeding up
these iterations? Perhaps the index has up-to-date filesystem information
that allows these methods to avoid filesystem interactions that are
necessary in the 'git reset' case?
 
> The second patch introduces the 'skip_cache_tree_update' option for
> 'unpack_trees()', but does not use it yet.
> 
> The remaining three patches update callers that make the aforementioned
> redundant cache tree updates. The performance impact on these callers ranges
> from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
> details can be found in the commit messages of the patch associated with the
> affected code path.

I found these patches well motivated and the code change to be so
unobtrusive that the benefits are well worth the new options.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-09 15:23 ` [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Derrick Stolee
@ 2022-11-09 22:18   ` Victoria Dye
  2022-11-10 14:44     ` Derrick Stolee
  0 siblings, 1 reply; 31+ messages in thread
From: Victoria Dye @ 2022-11-09 22:18 UTC (permalink / raw)
  To: Derrick Stolee, Victoria Dye via GitGitGadget, git
  Cc: gitster, phillip.wood123, jonathantanmy

Derrick Stolee wrote:
> On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
>> Following up on a discussion [1] around cache tree refreshes in 'git reset',
>> this series updates callers of 'unpack_trees()' to skip its internal
>> invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
>> immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
>> expensive operation, and it is redundant when 'prime_cache_tree()' clears
>> and rebuilds the cache tree from scratch immediately after.
>>
>> The first patch adds a test directly comparing the execution time of
>> 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
>> that on a fully-valid cache tree, they perform the same, but on a
>> fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
>> (although both are so fast that the total execution time of 100 invocations
>> is needed to compare the results in the default perf repo).
> 
> One thing I found interesting is how you needed 200 iterations to show
> a meaningful change in this test script, but in the case of 'git reset'
> we can see sizeable improvements even with a single iteration.

All of the new performance tests run with multiple iterations: 20 for reset
(10 iterations of two resets each), 100 for read-tree, 200 for the
comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts
were picked mostly by trial-and-error, to strike a balance of "the test
doesn't take too long to run" and "the change in execution time is clearly
visible in the results."

> 
> Is there something about this test that is artificially speeding up
> these iterations? Perhaps the index has up-to-date filesystem information
> that allows these methods to avoid filesystem interactions that are
> necessary in the 'git reset' case?

I would expect the "cache_tree_update, invalid" test's execution time, when
scaled to the iterations of 'read-tree' and 'reset', to match the change in
timing of those commands, but the command tests are reporting *much* larger
improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but
the results are *consistently* >=0.9s).

Per trace2 logs, a single invocation of 'read-tree' matching the one added
in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations,
the total time would be ~1.01s, which lines up with the 'p0006' test
results. However, the trace2 results for "test-tool cache-tree --count 3
--fresh --update" show the first iteration taking 0.013060s (looks good),
then the next taking 0.003755s, then 0.004026s (_much_ faster than
expected).

To be honest, I can't figure out what's going on there. It might be some
kind of runtime/memory optimization with repeatedly rebuilding the same
cache tree (doesn't seem to be compiler optimization, since the speedup
still happens with '-O0'). The only sure-fire way to avoid it seems to be
moving the iteration outside of 'test-cache-tree.c' and into 'p0090'.
Unfortunately, the command initialization overhead *really* slows things
down, but I can add a "control" test (with no cache tree refresh) to
quantify how long that initialization takes.

While looking into this, I found a few other things I'd like to add to/fix
in that test (add a "partially-invalidated" cache tree case, use the
original cache tree OID in 'prime_cache_tree()' rather than the OID at
HEAD), so I'll re-roll with those + the updated iteration logic.

Thanks for bringing this up!

>  
>> The second patch introduces the 'skip_cache_tree_update' option for
>> 'unpack_trees()', but does not use it yet.
>>
>> The remaining three patches update callers that make the aforementioned
>> redundant cache tree updates. The performance impact on these callers ranges
>> from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
>> details can be found in the commit messages of the patch associated with the
>> affected code path.
> 
> I found these patches well motivated and the code change to be so
> unobtrusive that the benefits are well worth the new options.

Thanks!

> 
> Thanks,
> -Stolee


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-11-09 15:23 ` [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Derrick Stolee
@ 2022-11-09 23:01 ` Taylor Blau
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
  7 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2022-11-09 23:01 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye

On Tue, Nov 08, 2022 at 10:44:20PM +0000, Victoria Dye via GitGitGadget wrote:
> Victoria Dye (5):
>   cache-tree: add perf test comparing update and prime
>   unpack-trees: add 'skip_cache_tree_update' option
>   reset: use 'skip_cache_tree_update' option
>   read-tree: use 'skip_cache_tree_update' option
>   rebase: use 'skip_cache_tree_update' option

Very cleanly done and demonstrated. I'll keep an eye out for the reroll
that you and Stolee discussed below.


Thanks,
Taylor

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-11-09 23:01 ` Taylor Blau
@ 2022-11-10  1:57 ` Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
                     ` (7 more replies)
  7 siblings, 8 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10  1:57 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye

Following up on a discussion [1] around cache tree refreshes in 'git reset',
this series updates callers of 'unpack_trees()' to skip its internal
invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
expensive operation, and it is redundant when 'prime_cache_tree()' clears
and rebuilds the cache tree from scratch immediately after.

The first patch adds a test directly comparing the execution time of
'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
that on a fully-valid cache tree, they perform the same, but on a partially-
or fully-invalid cache tree (the more likely case in commands with the
aforementioned redundancy), 'prime_cache_tree()' is faster.

The second patch introduces the 'skip_cache_tree_update' option for
'unpack_trees()', but does not use it yet.

The remaining three patches update callers that make the aforementioned
redundant cache tree updates. The performance impact on these callers ranges
from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
details can be found in the commit messages of the patch associated with the
affected code path.


Changes since V1
================

 * Rewrote 'p0090' to more accurately and reliably test 'prime_cache_tree()'
   vs. 'cache_tree_update()'.
   * Moved iterative cache tree update out of C and into the shell tests (to
     avoid potential runtime optimizations)
   * Added a "control" test to document how much of the execution time is
     startup overhead
   * Added tests demonstrating performance in partially-invalid cache trees.
 * Fixed the use of 'prime_cache_tree()' in 'test-tool cache-tree', changing
   it from using the tree at HEAD to the current cache tree.

Thanks!

 * Victoria

[1] https://lore.kernel.org/git/xmqqlf30edvf.fsf@gitster.g/ [2]
https://lore.kernel.org/git/f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com/

Victoria Dye (5):
  cache-tree: add perf test comparing update and prime
  unpack-trees: add 'skip_cache_tree_update' option
  reset: use 'skip_cache_tree_update' option
  read-tree: use 'skip_cache_tree_update' option
  rebase: use 'skip_cache_tree_update' option

 Makefile                           |  1 +
 builtin/read-tree.c                |  4 ++
 builtin/reset.c                    |  2 +
 reset.c                            |  1 +
 sequencer.c                        |  1 +
 t/helper/test-cache-tree.c         | 64 ++++++++++++++++++++++++++++++
 t/helper/test-tool.c               |  1 +
 t/helper/test-tool.h               |  1 +
 t/perf/p0006-read-tree-checkout.sh |  8 ++++
 t/perf/p0090-cache-tree.sh         | 36 +++++++++++++++++
 t/perf/p7102-reset.sh              | 21 ++++++++++
 t/t1022-read-tree-partial-clone.sh |  2 +-
 unpack-trees.c                     |  3 +-
 unpack-trees.h                     |  3 +-
 14 files changed, 145 insertions(+), 3 deletions(-)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh
 create mode 100755 t/perf/p7102-reset.sh


base-commit: 3b08839926fcc7cc48cf4c759737c1a71af430c1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1411%2Fvdye%2Ffeature%2Fcache-tree-optimization-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1411/vdye/feature/cache-tree-optimization-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1411

Range-diff vs v1:

 1:  45c198c629d ! 1:  833519d87c8 cache-tree: add perf test comparing update and prime
     @@ Commit message
          rebuilding an invalid cache tree, ultimately to remove one when both are
          (reundantly) called in immediate succession.
      
     -    Both methods are incredibly fast, so the new tests in 'p0090-cache-tree.sh'
     -    must call the tested method many times in succession to get non-negligible
     -    timing results. Results show a substantial difference in execution time
     -    between the two, with 'prime_cache_tree()' appearing to be the overall
     -    faster method:
     +    Both methods are fast, so the new tests in 'p0090-cache-tree.sh' must call
     +    each tested function multiple times to ensure the reported times (to 0.01s
     +    resolution) convey the differences between them.
      
     -    Test                                 this tree
     -    ----------------------------------------------------
     -    0090.1: prime_cache_tree, clean      0.07(0.05+0.01)
     -    0090.2: cache_tree_update, clean     0.11(0.05+0.06)
     -    0090.3: prime_cache_tree, invalid    0.06(0.05+0.01)
     -    0090.4: cache_tree_update, invalid   0.50(0.41+0.07)
     +    The tests compare the timing of a 'test-tool cache-tree' run as a no-op (to
     +    capture a baseline for the overhead associated with running the tool),
     +    'cache_tree_update()', and 'prime_cache_tree()' on four scenarios:
     +
     +    - A completely valid cache tree
     +    - A cache tree with 2 invalid paths
     +    - A cache tree with 50 invalid paths
     +    - A completely empty cache tree
     +
     +    Example results:
     +
     +    Test                                        this tree
     +    -----------------------------------------------------------
     +    0090.2: no-op, clean                        1.27(0.48+0.52)
     +    0090.3: prime_cache_tree, clean             2.02(0.83+0.85)
     +    0090.4: cache_tree_update, clean            1.30(0.49+0.54)
     +    0090.5: no-op, invalidate 2                 1.29(0.48+0.54)
     +    0090.6: prime_cache_tree, invalidate 2      1.98(0.81+0.83)
     +    0090.7: cache_tree_update, invalidate 2     2.12(0.94+0.86)
     +    0090.8: no-op, invalidate 50                1.32(0.50+0.55)
     +    0090.9: prime_cache_tree, invalidate 50     2.10(0.86+0.89)
     +    0090.10: cache_tree_update, invalidate 50   2.35(1.14+0.90)
     +    0090.11: no-op, empty                       1.33(0.50+0.54)
     +    0090.12: prime_cache_tree, empty            2.04(0.84+0.87)
     +    0090.13: cache_tree_update, empty           2.51(1.27+0.92)
     +
     +    These timings show that, while 'cache_tree_update()' is faster when the
     +    cache tree is completely valid, it is equal to or slower than
     +    'prime_cache_tree()' when there are any invalid paths. Since the redundant
     +    calls are mostly in scenarios where the cache tree will be at least
     +    partially invalid (e.g., 'git reset --hard'), 'prime_cache_tree()' will
     +    likely perform better than 'cache_tree_update()' in typical cases.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ t/helper/test-cache-tree.c (new)
      +#include "parse-options.h"
      +
      +static char const * const test_cache_tree_usage[] = {
     -+	N_("test-tool cache-tree <options> (prime|repair)"),
     ++	N_("test-tool cache-tree <options> (control|prime|update)"),
      +	NULL
      +};
      +
     @@ t/helper/test-cache-tree.c (new)
      +{
      +	struct object_id oid;
      +	struct tree *tree;
     -+	int fresh = 0;
     -+	int count = 1;
     ++	int empty = 0;
     ++	int invalidate_qty = 0;
      +	int i;
      +
      +	struct option options[] = {
     -+		OPT_BOOL(0, "fresh", &fresh,
     -+			 N_("clear the cache tree before each repetition")),
     -+		OPT_INTEGER_F(0, "count", &count, N_("number of times to repeat the operation"),
     ++		OPT_BOOL(0, "empty", &empty,
     ++			 N_("clear the cache tree before each iteration")),
     ++		OPT_INTEGER_F(0, "invalidate", &invalidate_qty,
     ++			      N_("number of entries in the cache tree to invalidate (default 0)"),
      +			      PARSE_OPT_NONEG),
      +		OPT_END()
      +	};
     @@ t/helper/test-cache-tree.c (new)
      +	if (read_cache() < 0)
      +		die("unable to read index file");
      +
     -+	get_oid("HEAD", &oid);
     ++	oidcpy(&oid, &the_index.cache_tree->oid);
      +	tree = parse_tree_indirect(&oid);
     -+	for (i = 0; i < count; i++) {
     -+		if (fresh)
     -+			cache_tree_free(&the_index.cache_tree);
     -+
     -+		if (!argc)
     -+			die("Must specify subcommand");
     -+		else if (!strcmp(argv[0], "prime"))
     -+			prime_cache_tree(the_repository, &the_index, tree);
     -+		else if (!strcmp(argv[0], "update"))
     -+			cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
     -+		else
     -+			die("Unknown command %s", argv[0]);
     ++	if (!tree)
     ++		die(_("not a tree object: %s"), oid_to_hex(&oid));
     ++
     ++	if (empty) {
     ++		/* clear the cache tree & allocate a new one */
     ++		cache_tree_free(&the_index.cache_tree);
     ++		the_index.cache_tree = cache_tree();
     ++	} else if (invalidate_qty) {
     ++		/* invalidate the specified number of unique paths */
     ++		float f_interval = (float)the_index.cache_nr / invalidate_qty;
     ++		int interval = f_interval < 1.0 ? 1 : (int)f_interval;
     ++		for (i = 0; i < invalidate_qty && i * interval < the_index.cache_nr; i++)
     ++			cache_tree_invalidate_path(&the_index, the_index.cache[i * interval]->name);
      +	}
      +
     ++	if (!argc)
     ++		die("Must specify subcommand");
     ++	else if (!strcmp(argv[0], "prime"))
     ++		prime_cache_tree(the_repository, &the_index, tree);
     ++	else if (!strcmp(argv[0], "update"))
     ++		cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
     ++	/* use "control" subcommand to specify no-op */
     ++	else if (!!strcmp(argv[0], "control"))
     ++		die("Unknown command %s", argv[0]);
     ++
      +	return 0;
      +}
      
     @@ t/perf/p0090-cache-tree.sh (new)
      @@
      +#!/bin/sh
      +
     -+test_description="Tests performance of cache tree operations"
     ++test_description="Tests performance of cache tree update operations"
      +
      +. ./perf-lib.sh
      +
      +test_perf_large_repo
      +test_checkout_worktree
      +
     -+count=200
     -+test_perf "prime_cache_tree, clean" "
     -+	test-tool cache-tree --count $count prime
     -+"
     ++count=100
     ++
     ++test_expect_success 'setup cache tree' '
     ++	git write-tree
     ++'
      +
     -+test_perf "cache_tree_update, clean" "
     -+	test-tool cache-tree --count $count update
     -+"
     ++test_cache_tree () {
     ++	test_perf "$1, $3" "
     ++		for i in \$(test_seq $count)
     ++		do
     ++			test-tool cache-tree $4 $2
     ++		done
     ++	"
     ++}
      +
     -+test_perf "prime_cache_tree, invalid" "
     -+	test-tool cache-tree --count $count --fresh prime
     -+"
     ++test_cache_tree_update_functions () {
     ++	test_cache_tree 'no-op' 'control' "$1" "$2"
     ++	test_cache_tree 'prime_cache_tree' 'prime' "$1" "$2"
     ++	test_cache_tree 'cache_tree_update' 'update' "$1" "$2"
     ++}
      +
     -+test_perf "cache_tree_update, invalid" "
     -+	test-tool cache-tree --count $count --fresh update
     -+"
     ++test_cache_tree_update_functions "clean" ""
     ++test_cache_tree_update_functions "invalidate 2" "--invalidate 2"
     ++test_cache_tree_update_functions "invalidate 50" "--invalidate 50"
     ++test_cache_tree_update_functions "empty" "--empty"
      +
      +test_done
 2:  d0a20cafd39 = 2:  b015a4f531c unpack-trees: add 'skip_cache_tree_update' option
 3:  908fe764670 = 3:  4f6039971b8 reset: use 'skip_cache_tree_update' option
 4:  319f1d71b2e = 4:  5a646bc47c9 read-tree: use 'skip_cache_tree_update' option
 5:  dd4edd7cad8 = 5:  fffe2fc17ed rebase: use 'skip_cache_tree_update' option

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/5] cache-tree: add perf test comparing update and prime
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
@ 2022-11-10  1:57   ` Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10  1:57 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a performance test comparing the execution times of 'prime_cache_tree()'
and 'cache_tree_update(_, WRITE_TREE_SILENT | WRITE_TREE_REPAIR)'. The goal
of comparing these two is to identify which is the faster method for
rebuilding an invalid cache tree, ultimately to remove one when both are
(reundantly) called in immediate succession.

Both methods are fast, so the new tests in 'p0090-cache-tree.sh' must call
each tested function multiple times to ensure the reported times (to 0.01s
resolution) convey the differences between them.

The tests compare the timing of a 'test-tool cache-tree' run as a no-op (to
capture a baseline for the overhead associated with running the tool),
'cache_tree_update()', and 'prime_cache_tree()' on four scenarios:

- A completely valid cache tree
- A cache tree with 2 invalid paths
- A cache tree with 50 invalid paths
- A completely empty cache tree

Example results:

Test                                        this tree
-----------------------------------------------------------
0090.2: no-op, clean                        1.27(0.48+0.52)
0090.3: prime_cache_tree, clean             2.02(0.83+0.85)
0090.4: cache_tree_update, clean            1.30(0.49+0.54)
0090.5: no-op, invalidate 2                 1.29(0.48+0.54)
0090.6: prime_cache_tree, invalidate 2      1.98(0.81+0.83)
0090.7: cache_tree_update, invalidate 2     2.12(0.94+0.86)
0090.8: no-op, invalidate 50                1.32(0.50+0.55)
0090.9: prime_cache_tree, invalidate 50     2.10(0.86+0.89)
0090.10: cache_tree_update, invalidate 50   2.35(1.14+0.90)
0090.11: no-op, empty                       1.33(0.50+0.54)
0090.12: prime_cache_tree, empty            2.04(0.84+0.87)
0090.13: cache_tree_update, empty           2.51(1.27+0.92)

These timings show that, while 'cache_tree_update()' is faster when the
cache tree is completely valid, it is equal to or slower than
'prime_cache_tree()' when there are any invalid paths. Since the redundant
calls are mostly in scenarios where the cache tree will be at least
partially invalid (e.g., 'git reset --hard'), 'prime_cache_tree()' will
likely perform better than 'cache_tree_update()' in typical cases.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Makefile                   |  1 +
 t/helper/test-cache-tree.c | 64 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c       |  1 +
 t/helper/test-tool.h       |  1 +
 t/perf/p0090-cache-tree.sh | 36 +++++++++++++++++++++
 5 files changed, 103 insertions(+)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh

diff --git a/Makefile b/Makefile
index 4927379184c..3639c7c2a94 100644
--- a/Makefile
+++ b/Makefile
@@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
 TEST_BUILTINS_OBJS += test-bundle-uri.o
+TEST_BUILTINS_OBJS += test-cache-tree.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/t/helper/test-cache-tree.c b/t/helper/test-cache-tree.c
new file mode 100644
index 00000000000..8d06039fb5c
--- /dev/null
+++ b/t/helper/test-cache-tree.c
@@ -0,0 +1,64 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "tree.h"
+#include "cache-tree.h"
+#include "parse-options.h"
+
+static char const * const test_cache_tree_usage[] = {
+	N_("test-tool cache-tree <options> (control|prime|update)"),
+	NULL
+};
+
+int cmd__cache_tree(int argc, const char **argv)
+{
+	struct object_id oid;
+	struct tree *tree;
+	int empty = 0;
+	int invalidate_qty = 0;
+	int i;
+
+	struct option options[] = {
+		OPT_BOOL(0, "empty", &empty,
+			 N_("clear the cache tree before each iteration")),
+		OPT_INTEGER_F(0, "invalidate", &invalidate_qty,
+			      N_("number of entries in the cache tree to invalidate (default 0)"),
+			      PARSE_OPT_NONEG),
+		OPT_END()
+	};
+
+	setup_git_directory();
+
+	parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);
+
+	if (read_cache() < 0)
+		die("unable to read index file");
+
+	oidcpy(&oid, &the_index.cache_tree->oid);
+	tree = parse_tree_indirect(&oid);
+	if (!tree)
+		die(_("not a tree object: %s"), oid_to_hex(&oid));
+
+	if (empty) {
+		/* clear the cache tree & allocate a new one */
+		cache_tree_free(&the_index.cache_tree);
+		the_index.cache_tree = cache_tree();
+	} else if (invalidate_qty) {
+		/* invalidate the specified number of unique paths */
+		float f_interval = (float)the_index.cache_nr / invalidate_qty;
+		int interval = f_interval < 1.0 ? 1 : (int)f_interval;
+		for (i = 0; i < invalidate_qty && i * interval < the_index.cache_nr; i++)
+			cache_tree_invalidate_path(&the_index, the_index.cache[i * interval]->name);
+	}
+
+	if (!argc)
+		die("Must specify subcommand");
+	else if (!strcmp(argv[0], "prime"))
+		prime_cache_tree(the_repository, &the_index, tree);
+	else if (!strcmp(argv[0], "update"))
+		cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
+	/* use "control" subcommand to specify no-op */
+	else if (!!strcmp(argv[0], "control"))
+		die("Unknown command %s", argv[0]);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 01cda9358df..547a3be1c8b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -14,6 +14,7 @@ static struct test_cmd cmds[] = {
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
 	{ "bundle-uri", cmd__bundle_uri },
+	{ "cache-tree", cmd__cache_tree },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ca2948066fd..e44e1d896d3 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -8,6 +8,7 @@ int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
 int cmd__bundle_uri(int argc, const char **argv);
+int cmd__cache_tree(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/perf/p0090-cache-tree.sh b/t/perf/p0090-cache-tree.sh
new file mode 100755
index 00000000000..a8eabca2c4d
--- /dev/null
+++ b/t/perf/p0090-cache-tree.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description="Tests performance of cache tree update operations"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+count=100
+
+test_expect_success 'setup cache tree' '
+	git write-tree
+'
+
+test_cache_tree () {
+	test_perf "$1, $3" "
+		for i in \$(test_seq $count)
+		do
+			test-tool cache-tree $4 $2
+		done
+	"
+}
+
+test_cache_tree_update_functions () {
+	test_cache_tree 'no-op' 'control' "$1" "$2"
+	test_cache_tree 'prime_cache_tree' 'prime' "$1" "$2"
+	test_cache_tree 'cache_tree_update' 'update' "$1" "$2"
+}
+
+test_cache_tree_update_functions "clean" ""
+test_cache_tree_update_functions "invalidate 2" "--invalidate 2"
+test_cache_tree_update_functions "invalidate 50" "--invalidate 50"
+test_cache_tree_update_functions "empty" "--empty"
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 2/5] unpack-trees: add 'skip_cache_tree_update' option
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
@ 2022-11-10  1:57   ` Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 3/5] reset: use " Victoria Dye via GitGitGadget
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10  1:57 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add (disabled by default) option to skip the 'cache_tree_update()' at the
end of 'unpack_trees()'. In many cases, this cache tree update is redundant
because the caller of 'unpack_trees()' immediately follows it with
'prime_cache_tree()', rebuilding the entire cache tree from scratch. While
these operations aren't the most expensive part of operations like 'git
reset', the duplicate calls still create a minor unnecessary slowdown.

Introduce an option for callers to skip the 'cache_tree_update()' in
'unpack_trees()' if it is redundant (that is, if 'prime_cache_tree()' is
called afterwards). At the moment, no 'unpack_trees()' callers use the new
option; they will be updated in subsequent patches.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 3 ++-
 unpack-trees.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index bae812156c4..8a762aa0772 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2043,7 +2043,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		if (!ret) {
 			if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
 				cache_tree_verify(the_repository, &o->result);
-			if (!cache_tree_fully_valid(o->result.cache_tree))
+			if (!o->skip_cache_tree_update &&
+			    !cache_tree_fully_valid(o->result.cache_tree))
 				cache_tree_update(&o->result,
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
diff --git a/unpack-trees.h b/unpack-trees.h
index efb9edfbb27..6ab0d74c84d 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -71,7 +71,8 @@ struct unpack_trees_options {
 		     quiet,
 		     exiting_early,
 		     show_all_errors,
-		     dry_run;
+		     dry_run,
+		     skip_cache_tree_update;
 	enum unpack_trees_reset_type reset;
 	const char *prefix;
 	int cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 3/5] reset: use 'skip_cache_tree_update' option
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
@ 2022-11-10  1:57   ` Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 4/5] read-tree: " Victoria Dye via GitGitGadget
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10  1:57 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Enable the 'skip_cache_tree_update' option in the variants that call
'prime_cache_tree()' after 'unpack_trees()' (specifically, 'git reset
--mixed' and 'git reset --hard'). This avoids redundantly rebuilding the
cache tree in both 'cache_tree_update()' at the end of 'unpack_trees()' and
in 'prime_cache_tree()', resulting in a small (but consistent) performance
improvement. From the newly-added 'p7102-reset.sh' test:

Test                         before            after
--------------------------------------------------------------------
7102.1: reset --hard (...)   2.11(0.40+1.54)   1.97(0.38+1.47) -6.6%

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c       |  2 ++
 t/perf/p7102-reset.sh | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)
 create mode 100755 t/perf/p7102-reset.sh

diff --git a/builtin/reset.c b/builtin/reset.c
index fdce6f8c856..ab027774824 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -73,9 +73,11 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	case HARD:
 		opts.update = 1;
 		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+		opts.skip_cache_tree_update = 1;
 		break;
 	case MIXED:
 		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+		opts.skip_cache_tree_update = 1;
 		/* but opts.update=0, so working tree not updated */
 		break;
 	default:
diff --git a/t/perf/p7102-reset.sh b/t/perf/p7102-reset.sh
new file mode 100755
index 00000000000..9b039e8691f
--- /dev/null
+++ b/t/perf/p7102-reset.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+test_description='performance of reset'
+. ./perf-lib.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+test_perf 'reset --hard with change in tree' '
+	base=$(git rev-parse HEAD) &&
+	test_commit --no-tag A &&
+	new=$(git rev-parse HEAD) &&
+
+	for i in $(test_seq 10)
+	do
+		git reset --hard $new &&
+		git reset --hard $base || return $?
+	done
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 4/5] read-tree: use 'skip_cache_tree_update' option
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-11-10  1:57   ` [PATCH v2 3/5] reset: use " Victoria Dye via GitGitGadget
@ 2022-11-10  1:57   ` Victoria Dye via GitGitGadget
  2022-11-10  1:57   ` [PATCH v2 5/5] rebase: " Victoria Dye via GitGitGadget
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10  1:57 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

When running 'read-tree' with a single tree and no prefix,
'prime_cache_tree()' is called after the tree is unpacked. In that
situation, skip a redundant call to 'cache_tree_update()' in
'unpack_trees()' by enabling the 'skip_cache_tree_update' unpack option.

Removing the redundant cache tree update provides a substantial performance
improvement to 'git read-tree <tree-ish>', as shown by a test added to
'p0006-read-tree-checkout.sh':

Test                          before            after
----------------------------------------------------------------------
read-tree br_ballast_plus_1   3.94(1.80+1.57)   3.00(1.14+1.28) -23.9%

Note that the 'read-tree' in 't1022-read-tree-partial-clone.sh' is updated
to read two trees, rather than one. The test was first introduced in
d3da223f221 (cache-tree: prefetch in partial clone read-tree, 2021-07-23) to
exercise the 'cache_tree_update()' code path, as used in 'git merge'. Since
this patch drops the call to 'cache_tree_update()' in single-tree 'git
read-tree', change the test to use the two-tree variant so that
'cache_tree_update()' is called as intended.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/read-tree.c                | 4 ++++
 t/perf/p0006-read-tree-checkout.sh | 8 ++++++++
 t/t1022-read-tree-partial-clone.sh | 2 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index f4cbe460b97..45c6652444b 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -249,6 +249,10 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (opts.debug_unpack)
 		opts.fn = debug_merge;
 
+	/* If we're going to prime_cache_tree later, skip cache tree update */
+	if (nr_trees == 1 && !opts.prefix)
+		opts.skip_cache_tree_update = 1;
+
 	cache_tree_free(&active_cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
diff --git a/t/perf/p0006-read-tree-checkout.sh b/t/perf/p0006-read-tree-checkout.sh
index c481c012d2f..325566e18eb 100755
--- a/t/perf/p0006-read-tree-checkout.sh
+++ b/t/perf/p0006-read-tree-checkout.sh
@@ -49,6 +49,14 @@ test_perf "read-tree br_base br_ballast ($nr_files)" '
 	git read-tree -n -m br_base br_ballast
 '
 
+test_perf "read-tree br_ballast_plus_1 ($nr_files)" '
+	# Run read-tree 100 times for clearer performance results & comparisons
+	for i in  $(test_seq 100)
+	do
+		git read-tree -n -m br_ballast_plus_1 || return 1
+	done
+'
+
 test_perf "switch between br_base br_ballast ($nr_files)" '
 	git checkout -q br_base &&
 	git checkout -q br_ballast
diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
index a9953b6a71c..da539716359 100755
--- a/t/t1022-read-tree-partial-clone.sh
+++ b/t/t1022-read-tree-partial-clone.sh
@@ -19,7 +19,7 @@ test_expect_success 'read-tree in partial clone prefetches in one batch' '
 	git -C server config uploadpack.allowfilter 1 &&
 	git -C server config uploadpack.allowanysha1inwant 1 &&
 	git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
-	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE $TREE &&
 
 	# "done" marks the end of negotiation (once per fetch). Expect that
 	# only one fetch occurs.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 5/5] rebase: use 'skip_cache_tree_update' option
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-11-10  1:57   ` [PATCH v2 4/5] read-tree: " Victoria Dye via GitGitGadget
@ 2022-11-10  1:57   ` Victoria Dye via GitGitGadget
  2022-11-10 14:40     ` Phillip Wood
  2022-11-10  2:12   ` [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Taylor Blau
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10  1:57 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Enable the 'skip_cache_tree_update' option in both 'do_reset()'
('sequencer.c') and 'reset_head()' ('reset.c'). Both of these callers invoke
'prime_cache_tree()' after 'unpack_trees()', so we can remove an unnecessary
cache tree rebuild by skipping 'cache_tree_update()'.

When testing with 'p3400-rebase.sh' and 'p3404-rebase-interactive.sh', the
performance change of this update was negligible, likely due to the
operation being dominated by more expensive operations (like checking out
trees). However, since the change doesn't harm performance, it's worth
keeping this 'unpack_trees()' usage consistent with others that subsequently
invoke 'prime_cache_tree()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 reset.c     | 1 +
 sequencer.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/reset.c b/reset.c
index e3383a93343..5ded23611f3 100644
--- a/reset.c
+++ b/reset.c
@@ -128,6 +128,7 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	unpack_tree_opts.skip_cache_tree_update = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (reset_hard)
 		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
diff --git a/sequencer.c b/sequencer.c
index e658df7e8ff..3f7a73ce4e1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3750,6 +3750,7 @@ static int do_reset(struct repository *r,
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	unpack_tree_opts.skip_cache_tree_update = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
 
 	if (repo_read_index_unmerged(r)) {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-11-10  1:57   ` [PATCH v2 5/5] rebase: " Victoria Dye via GitGitGadget
@ 2022-11-10  2:12   ` Taylor Blau
  2022-11-10 17:26   ` Derrick Stolee
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
  7 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2022-11-10  2:12 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye

On Thu, Nov 10, 2022 at 01:57:12AM +0000, Victoria Dye via GitGitGadget wrote:
> Changes since V1
> ================
>
>  * Rewrote 'p0090' to more accurately and reliably test 'prime_cache_tree()'
>    vs. 'cache_tree_update()'.
>    * Moved iterative cache tree update out of C and into the shell tests (to
>      avoid potential runtime optimizations)
>    * Added a "control" test to document how much of the execution time is
>      startup overhead
>    * Added tests demonstrating performance in partially-invalid cache trees.
>  * Fixed the use of 'prime_cache_tree()' in 'test-tool cache-tree', changing
>    it from using the tree at HEAD to the current cache tree.

All seem very reasonable to me, and the range-diff matches what you say.

Let's hear from Stolee, who reviewed the first round, too, and then we
should feel comfortable to start merging this down.


Thanks,
Taylor

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] cache-tree: add perf test comparing update and prime
  2022-11-08 22:44 ` [PATCH 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
@ 2022-11-10  7:23   ` SZEDER Gábor
  0 siblings, 0 replies; 31+ messages in thread
From: SZEDER Gábor @ 2022-11-10  7:23 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Victoria Dye

On Tue, Nov 08, 2022 at 10:44:21PM +0000, Victoria Dye via GitGitGadget wrote:
> diff --git a/t/helper/test-cache-tree.c b/t/helper/test-cache-tree.c
> new file mode 100644
> index 00000000000..2fad6d06d30
> --- /dev/null
> +++ b/t/helper/test-cache-tree.c
> @@ -0,0 +1,52 @@
> +#include "test-tool.h"
> +#include "cache.h"
> +#include "tree.h"
> +#include "cache-tree.h"
> +#include "parse-options.h"
> +
> +static char const * const test_cache_tree_usage[] = {
> +	N_("test-tool cache-tree <options> (prime|repair)"),

The code looking at 'argv[0]' below only handles "prime" and "update",
but not "repair".

> +	NULL
> +};
> +
> +int cmd__cache_tree(int argc, const char **argv)
> +{
> +	struct object_id oid;
> +	struct tree *tree;
> +	int fresh = 0;
> +	int count = 1;
> +	int i;
> +
> +	struct option options[] = {
> +		OPT_BOOL(0, "fresh", &fresh,
> +			 N_("clear the cache tree before each repetition")),
> +		OPT_INTEGER_F(0, "count", &count, N_("number of times to repeat the operation"),
> +			      PARSE_OPT_NONEG),
> +		OPT_END()
> +	};
> +
> +	setup_git_directory();
> +
> +	parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);

Here 'argc' must be updated with the return value of parse_options(),
otherwise the 'if (!argc)' condition doesn't catch what it's supposed
to, and the subsequent 'else if' segfaults when passes the NULL
argv[0] to strcmp().

> +
> +	if (read_cache() < 0)
> +		die("unable to read index file");
> +
> +	get_oid("HEAD", &oid);
> +	tree = parse_tree_indirect(&oid);
> +	for (i = 0; i < count; i++) {
> +		if (fresh)
> +			cache_tree_free(&the_index.cache_tree);
> +
> +		if (!argc)

What if argc > 1?

> +			die("Must specify subcommand");

I think it would be nice to show usage here ...

> +		else if (!strcmp(argv[0], "prime"))
> +			prime_cache_tree(the_repository, &the_index, tree);
> +		else if (!strcmp(argv[0], "update"))
> +			cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
> +		else
> +			die("Unknown command %s", argv[0]);

... and here as well.

> +	}
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 5/5] rebase: use 'skip_cache_tree_update' option
  2022-11-10  1:57   ` [PATCH v2 5/5] rebase: " Victoria Dye via GitGitGadget
@ 2022-11-10 14:40     ` Phillip Wood
  2022-11-10 18:19       ` Victoria Dye
  0 siblings, 1 reply; 31+ messages in thread
From: Phillip Wood @ 2022-11-10 14:40 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye

Hi Victoria

On 10/11/2022 01:57, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Enable the 'skip_cache_tree_update' option in both 'do_reset()'
> ('sequencer.c') and 'reset_head()' ('reset.c'). Both of these callers invoke
> 'prime_cache_tree()' after 'unpack_trees()', so we can remove an unnecessary
> cache tree rebuild by skipping 'cache_tree_update()'.
> 
> When testing with 'p3400-rebase.sh' and 'p3404-rebase-interactive.sh', the
> performance change of this update was negligible, likely due to the
> operation being dominated by more expensive operations (like checking out
> trees).

Yes, we only call this once at the beginning of the rebase and then for 
any reset commands and the run time will be dominated by picking commits.

> However, since the change doesn't harm performance, it's worth
> keeping this 'unpack_trees()' usage consistent with others that subsequently
> invoke 'prime_cache_tree()'.

That makes sense

> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>   reset.c     | 1 +
>   sequencer.c | 1 +
>   2 files changed, 2 insertions(+)
> 
> diff --git a/reset.c b/reset.c
> index e3383a93343..5ded23611f3 100644
> --- a/reset.c
> +++ b/reset.c
> @@ -128,6 +128,7 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
	unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
>   	unpack_tree_opts.update = 1;
>   	unpack_tree_opts.merge = 1;
>   	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
> +	unpack_tree_opts.skip_cache_tree_update = 1;

I've added an extra context line above to show that we do either a 
one-way or two-way merge - is it safe to skip the cache_tree_update for 
the two-way merge? (I'm afraid I seem to have forgotten everything I 
learnt about prime_cache_tree() and cache_tree_update() when we 
discussed this optimization before).

Best Wishes

Phillip

>   	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
>   	if (reset_hard)
>   		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
> diff --git a/sequencer.c b/sequencer.c
> index e658df7e8ff..3f7a73ce4e1 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3750,6 +3750,7 @@ static int do_reset(struct repository *r,
>   	unpack_tree_opts.merge = 1;
>   	unpack_tree_opts.update = 1;
>   	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
> +	unpack_tree_opts.skip_cache_tree_update = 1;
>   	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
>   
>   	if (repo_read_index_unmerged(r)) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-09 22:18   ` Victoria Dye
@ 2022-11-10 14:44     ` Derrick Stolee
  0 siblings, 0 replies; 31+ messages in thread
From: Derrick Stolee @ 2022-11-10 14:44 UTC (permalink / raw)
  To: Victoria Dye, Victoria Dye via GitGitGadget, git
  Cc: gitster, phillip.wood123, jonathantanmy

On 11/9/2022 5:18 PM, Victoria Dye wrote:
> Derrick Stolee wrote:
>> On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
>>> Following up on a discussion [1] around cache tree refreshes in 'git reset',
>>> this series updates callers of 'unpack_trees()' to skip its internal
>>> invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
>>> immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
>>> expensive operation, and it is redundant when 'prime_cache_tree()' clears
>>> and rebuilds the cache tree from scratch immediately after.
>>>
>>> The first patch adds a test directly comparing the execution time of
>>> 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
>>> that on a fully-valid cache tree, they perform the same, but on a
>>> fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
>>> (although both are so fast that the total execution time of 100 invocations
>>> is needed to compare the results in the default perf repo).
>>
>> One thing I found interesting is how you needed 200 iterations to show
>> a meaningful change in this test script, but in the case of 'git reset'
>> we can see sizeable improvements even with a single iteration.
> 
> All of the new performance tests run with multiple iterations: 20 for reset
> (10 iterations of two resets each), 100 for read-tree, 200 for the
> comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts
> were picked mostly by trial-and-error, to strike a balance of "the test
> doesn't take too long to run" and "the change in execution time is clearly
> visible in the results."

Thanks for pointing out my misunderstanding. I missed the repeat counts
because 2-3 seconds "seemed right" based on performance tests of large
monorepos, but clearly that's not right when using the Git repository for
performance tests.
>> Is there something about this test that is artificially speeding up
>> these iterations? Perhaps the index has up-to-date filesystem information
>> that allows these methods to avoid filesystem interactions that are
>> necessary in the 'git reset' case?
> 
> I would expect the "cache_tree_update, invalid" test's execution time, when
> scaled to the iterations of 'read-tree' and 'reset', to match the change in
> timing of those commands, but the command tests are reporting *much* larger
> improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but
> the results are *consistently* >=0.9s).
> 
> Per trace2 logs, a single invocation of 'read-tree' matching the one added
> in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations,
> the total time would be ~1.01s, which lines up with the 'p0006' test
> results. However, the trace2 results for "test-tool cache-tree --count 3
> --fresh --update" show the first iteration taking 0.013060s (looks good),
> then the next taking 0.003755s, then 0.004026s (_much_ faster than
> expected).
> 
> To be honest, I can't figure out what's going on there. It might be some
> kind of runtime/memory optimization with repeatedly rebuilding the same
> cache tree (doesn't seem to be compiler optimization, since the speedup
> still happens with '-O0'). The only sure-fire way to avoid it seems to be
> moving the iteration outside of 'test-cache-tree.c' and into 'p0090'.
> Unfortunately, the command initialization overhead *really* slows things
> down, but I can add a "control" test (with no cache tree refresh) to
> quantify how long that initialization takes.

Getting unit-level performance tests is always tricky. Sometimes the best
way to do it is to collect a sample using GIT_TRACE2_PERF and then manually
collect the region times. It could be a fun project to integrate region
measurements into the performance test suite instead of only end-to-end
timings.
 
> While looking into this, I found a few other things I'd like to add to/fix
> in that test (add a "partially-invalidated" cache tree case, use the
> original cache tree OID in 'prime_cache_tree()' rather than the OID at
> HEAD), so I'll re-roll with those + the updated iteration logic.

Taking a look now. Thanks!

-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-11-10  2:12   ` [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Taylor Blau
@ 2022-11-10 17:26   ` Derrick Stolee
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
  7 siblings, 0 replies; 31+ messages in thread
From: Derrick Stolee @ 2022-11-10 17:26 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: gitster, phillip.wood123, jonathantanmy, Taylor Blau,
	Victoria Dye

On 11/9/2022 8:57 PM, Victoria Dye via GitGitGadget wrote:
> Changes since V1
> ================
> 
>  * Rewrote 'p0090' to more accurately and reliably test 'prime_cache_tree()'
>    vs. 'cache_tree_update()'.
>    * Moved iterative cache tree update out of C and into the shell tests (to
>      avoid potential runtime optimizations)
>    * Added a "control" test to document how much of the execution time is
>      startup overhead
>    * Added tests demonstrating performance in partially-invalid cache trees.
>  * Fixed the use of 'prime_cache_tree()' in 'test-tool cache-tree', changing
>    it from using the tree at HEAD to the current cache tree.

I did a re-read of this series and it looks good to me.

Thanks for doing this investigation!
-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 5/5] rebase: use 'skip_cache_tree_update' option
  2022-11-10 14:40     ` Phillip Wood
@ 2022-11-10 18:19       ` Victoria Dye
  0 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye @ 2022-11-10 18:19 UTC (permalink / raw)
  To: phillip.wood, Victoria Dye via GitGitGadget, git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau

Phillip Wood wrote:
> Hi Victoria
> 
> On 10/11/2022 01:57, Victoria Dye via GitGitGadget wrote:
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>   reset.c     | 1 +
>>   sequencer.c | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/reset.c b/reset.c
>> index e3383a93343..5ded23611f3 100644
>> --- a/reset.c
>> +++ b/reset.c
>> @@ -128,6 +128,7 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
>>       unpack_tree_opts.fn = reset_hard ? oneway_merge : twoway_merge;
>>       unpack_tree_opts.update = 1;
>>       unpack_tree_opts.merge = 1;
>>       unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
>> +     unpack_tree_opts.skip_cache_tree_update = 1;
> 
> I've added an extra context line above to show that we do either a one-way
> or two-way merge - is it safe to skip the cache_tree_update for the
> two-way merge? (I'm afraid I seem to have forgotten everything I learnt
> about prime_cache_tree() and cache_tree_update() when we discussed this
> optimization before).

Yes - 'prime_cache_tree()' is called immediately after 'unpack_trees()' in
both the one-way and two-way merge cases. Because 'prime_cache_tree()'
unconditionally clears the cache tree and rebuilds it from scratch,
repairing the cache tree with 'cache_tree_update()' at the end of
'unpack_trees()' is unnecessary.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
                     ` (6 preceding siblings ...)
  2022-11-10 17:26   ` Derrick Stolee
@ 2022-11-10 19:06   ` Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
                       ` (6 more replies)
  7 siblings, 7 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10 19:06 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	szeder.dev, Taylor Blau, Victoria Dye

Following up on a discussion [1] around cache tree refreshes in 'git reset',
this series updates callers of 'unpack_trees()' to skip its internal
invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
expensive operation, and it is redundant when 'prime_cache_tree()' clears
and rebuilds the cache tree from scratch immediately after.

The first patch adds a test directly comparing the execution time of
'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
that on a fully-valid cache tree, they perform the same, but on a partially-
or fully-invalid cache tree (the more likely case in commands with the
aforementioned redundancy), 'prime_cache_tree()' is faster.

The second patch introduces the 'skip_cache_tree_update' option for
'unpack_trees()', but does not use it yet.

The remaining three patches update callers that make the aforementioned
redundant cache tree updates. The performance impact on these callers ranges
from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
details can be found in the commit messages of the patch associated with the
affected code path.


Changes since V2
================

 * Cleaned up option handling & provided more informative error messages in
   'test-tool cache-tree'. The changes don't affect any behavior in the
   added tests & 'test-tool cache-tree' won't be used outside of
   development, but the improvements here will help future readers avoid
   propagating error-prone implementations.
   * Note that the suggestion to change the "unknown subcommand" error to a
     'usage()' error was not taken, as it would be somewhat cumbersome to
     use a formatted string with it. This is in line with other custom
     subcommand parsing in Git, such as in 'fsmonitor--daemon.c'.


Changes since V1
================

 * Rewrote 'p0090' to more accurately and reliably test 'prime_cache_tree()'
   vs. 'cache_tree_update()'.
   * Moved iterative cache tree update out of C and into the shell tests (to
     avoid potential runtime optimizations)
   * Added a "control" test to document how much of the execution time is
     startup overhead
   * Added tests demonstrating performance in partially-invalid cache trees.
 * Fixed the use of 'prime_cache_tree()' in 'test-tool cache-tree', changing
   it from using the tree at HEAD to the current cache tree.

Thanks!

 * Victoria

[1] https://lore.kernel.org/git/xmqqlf30edvf.fsf@gitster.g/ [2]
https://lore.kernel.org/git/f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com/

Victoria Dye (5):
  cache-tree: add perf test comparing update and prime
  unpack-trees: add 'skip_cache_tree_update' option
  reset: use 'skip_cache_tree_update' option
  read-tree: use 'skip_cache_tree_update' option
  rebase: use 'skip_cache_tree_update' option

 Makefile                           |  1 +
 builtin/read-tree.c                |  4 ++
 builtin/reset.c                    |  2 +
 reset.c                            |  1 +
 sequencer.c                        |  1 +
 t/helper/test-cache-tree.c         | 64 ++++++++++++++++++++++++++++++
 t/helper/test-tool.c               |  1 +
 t/helper/test-tool.h               |  1 +
 t/perf/p0006-read-tree-checkout.sh |  8 ++++
 t/perf/p0090-cache-tree.sh         | 36 +++++++++++++++++
 t/perf/p7102-reset.sh              | 21 ++++++++++
 t/t1022-read-tree-partial-clone.sh |  2 +-
 unpack-trees.c                     |  3 +-
 unpack-trees.h                     |  3 +-
 14 files changed, 145 insertions(+), 3 deletions(-)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh
 create mode 100755 t/perf/p7102-reset.sh


base-commit: 3b08839926fcc7cc48cf4c759737c1a71af430c1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1411%2Fvdye%2Ffeature%2Fcache-tree-optimization-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1411/vdye/feature/cache-tree-optimization-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1411

Range-diff vs v2:

 1:  833519d87c8 ! 1:  2b48a684156 cache-tree: add perf test comparing update and prime
     @@ Commit message
          partially invalid (e.g., 'git reset --hard'), 'prime_cache_tree()' will
          likely perform better than 'cache_tree_update()' in typical cases.
      
     +    Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Makefile ##
     @@ t/helper/test-cache-tree.c (new)
      +
      +	setup_git_directory();
      +
     -+	parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);
     ++	argc = parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);
      +
      +	if (read_cache() < 0)
     -+		die("unable to read index file");
     ++		die(_("unable to read index file"));
      +
      +	oidcpy(&oid, &the_index.cache_tree->oid);
      +	tree = parse_tree_indirect(&oid);
     @@ t/helper/test-cache-tree.c (new)
      +			cache_tree_invalidate_path(&the_index, the_index.cache[i * interval]->name);
      +	}
      +
     -+	if (!argc)
     -+		die("Must specify subcommand");
     ++	if (argc != 1)
     ++		usage_with_options(test_cache_tree_usage, options);
      +	else if (!strcmp(argv[0], "prime"))
      +		prime_cache_tree(the_repository, &the_index, tree);
      +	else if (!strcmp(argv[0], "update"))
      +		cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
      +	/* use "control" subcommand to specify no-op */
      +	else if (!!strcmp(argv[0], "control"))
     -+		die("Unknown command %s", argv[0]);
     ++		die(_("Unhandled subcommand '%s'"), argv[0]);
      +
      +	return 0;
      +}
 2:  b015a4f531c = 2:  0e03614f0fd unpack-trees: add 'skip_cache_tree_update' option
 3:  4f6039971b8 = 3:  386f18ca36a reset: use 'skip_cache_tree_update' option
 4:  5a646bc47c9 = 4:  ea5c82ce992 read-tree: use 'skip_cache_tree_update' option
 5:  fffe2fc17ed = 5:  100c01e936c rebase: use 'skip_cache_tree_update' option

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 1/5] cache-tree: add perf test comparing update and prime
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
@ 2022-11-10 19:06     ` Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10 19:06 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	szeder.dev, Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a performance test comparing the execution times of 'prime_cache_tree()'
and 'cache_tree_update(_, WRITE_TREE_SILENT | WRITE_TREE_REPAIR)'. The goal
of comparing these two is to identify which is the faster method for
rebuilding an invalid cache tree, ultimately to remove one when both are
(reundantly) called in immediate succession.

Both methods are fast, so the new tests in 'p0090-cache-tree.sh' must call
each tested function multiple times to ensure the reported times (to 0.01s
resolution) convey the differences between them.

The tests compare the timing of a 'test-tool cache-tree' run as a no-op (to
capture a baseline for the overhead associated with running the tool),
'cache_tree_update()', and 'prime_cache_tree()' on four scenarios:

- A completely valid cache tree
- A cache tree with 2 invalid paths
- A cache tree with 50 invalid paths
- A completely empty cache tree

Example results:

Test                                        this tree
-----------------------------------------------------------
0090.2: no-op, clean                        1.27(0.48+0.52)
0090.3: prime_cache_tree, clean             2.02(0.83+0.85)
0090.4: cache_tree_update, clean            1.30(0.49+0.54)
0090.5: no-op, invalidate 2                 1.29(0.48+0.54)
0090.6: prime_cache_tree, invalidate 2      1.98(0.81+0.83)
0090.7: cache_tree_update, invalidate 2     2.12(0.94+0.86)
0090.8: no-op, invalidate 50                1.32(0.50+0.55)
0090.9: prime_cache_tree, invalidate 50     2.10(0.86+0.89)
0090.10: cache_tree_update, invalidate 50   2.35(1.14+0.90)
0090.11: no-op, empty                       1.33(0.50+0.54)
0090.12: prime_cache_tree, empty            2.04(0.84+0.87)
0090.13: cache_tree_update, empty           2.51(1.27+0.92)

These timings show that, while 'cache_tree_update()' is faster when the
cache tree is completely valid, it is equal to or slower than
'prime_cache_tree()' when there are any invalid paths. Since the redundant
calls are mostly in scenarios where the cache tree will be at least
partially invalid (e.g., 'git reset --hard'), 'prime_cache_tree()' will
likely perform better than 'cache_tree_update()' in typical cases.

Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Makefile                   |  1 +
 t/helper/test-cache-tree.c | 64 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c       |  1 +
 t/helper/test-tool.h       |  1 +
 t/perf/p0090-cache-tree.sh | 36 +++++++++++++++++++++
 5 files changed, 103 insertions(+)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh

diff --git a/Makefile b/Makefile
index 4927379184c..3639c7c2a94 100644
--- a/Makefile
+++ b/Makefile
@@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
 TEST_BUILTINS_OBJS += test-bundle-uri.o
+TEST_BUILTINS_OBJS += test-cache-tree.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/t/helper/test-cache-tree.c b/t/helper/test-cache-tree.c
new file mode 100644
index 00000000000..93051b25f56
--- /dev/null
+++ b/t/helper/test-cache-tree.c
@@ -0,0 +1,64 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "tree.h"
+#include "cache-tree.h"
+#include "parse-options.h"
+
+static char const * const test_cache_tree_usage[] = {
+	N_("test-tool cache-tree <options> (control|prime|update)"),
+	NULL
+};
+
+int cmd__cache_tree(int argc, const char **argv)
+{
+	struct object_id oid;
+	struct tree *tree;
+	int empty = 0;
+	int invalidate_qty = 0;
+	int i;
+
+	struct option options[] = {
+		OPT_BOOL(0, "empty", &empty,
+			 N_("clear the cache tree before each iteration")),
+		OPT_INTEGER_F(0, "invalidate", &invalidate_qty,
+			      N_("number of entries in the cache tree to invalidate (default 0)"),
+			      PARSE_OPT_NONEG),
+		OPT_END()
+	};
+
+	setup_git_directory();
+
+	argc = parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);
+
+	if (read_cache() < 0)
+		die(_("unable to read index file"));
+
+	oidcpy(&oid, &the_index.cache_tree->oid);
+	tree = parse_tree_indirect(&oid);
+	if (!tree)
+		die(_("not a tree object: %s"), oid_to_hex(&oid));
+
+	if (empty) {
+		/* clear the cache tree & allocate a new one */
+		cache_tree_free(&the_index.cache_tree);
+		the_index.cache_tree = cache_tree();
+	} else if (invalidate_qty) {
+		/* invalidate the specified number of unique paths */
+		float f_interval = (float)the_index.cache_nr / invalidate_qty;
+		int interval = f_interval < 1.0 ? 1 : (int)f_interval;
+		for (i = 0; i < invalidate_qty && i * interval < the_index.cache_nr; i++)
+			cache_tree_invalidate_path(&the_index, the_index.cache[i * interval]->name);
+	}
+
+	if (argc != 1)
+		usage_with_options(test_cache_tree_usage, options);
+	else if (!strcmp(argv[0], "prime"))
+		prime_cache_tree(the_repository, &the_index, tree);
+	else if (!strcmp(argv[0], "update"))
+		cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
+	/* use "control" subcommand to specify no-op */
+	else if (!!strcmp(argv[0], "control"))
+		die(_("Unhandled subcommand '%s'"), argv[0]);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 01cda9358df..547a3be1c8b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -14,6 +14,7 @@ static struct test_cmd cmds[] = {
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
 	{ "bundle-uri", cmd__bundle_uri },
+	{ "cache-tree", cmd__cache_tree },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ca2948066fd..e44e1d896d3 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -8,6 +8,7 @@ int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
 int cmd__bundle_uri(int argc, const char **argv);
+int cmd__cache_tree(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/perf/p0090-cache-tree.sh b/t/perf/p0090-cache-tree.sh
new file mode 100755
index 00000000000..a8eabca2c4d
--- /dev/null
+++ b/t/perf/p0090-cache-tree.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description="Tests performance of cache tree update operations"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+count=100
+
+test_expect_success 'setup cache tree' '
+	git write-tree
+'
+
+test_cache_tree () {
+	test_perf "$1, $3" "
+		for i in \$(test_seq $count)
+		do
+			test-tool cache-tree $4 $2
+		done
+	"
+}
+
+test_cache_tree_update_functions () {
+	test_cache_tree 'no-op' 'control' "$1" "$2"
+	test_cache_tree 'prime_cache_tree' 'prime' "$1" "$2"
+	test_cache_tree 'cache_tree_update' 'update' "$1" "$2"
+}
+
+test_cache_tree_update_functions "clean" ""
+test_cache_tree_update_functions "invalidate 2" "--invalidate 2"
+test_cache_tree_update_functions "invalidate 50" "--invalidate 50"
+test_cache_tree_update_functions "empty" "--empty"
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/5] unpack-trees: add 'skip_cache_tree_update' option
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
@ 2022-11-10 19:06     ` Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 3/5] reset: use " Victoria Dye via GitGitGadget
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10 19:06 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	szeder.dev, Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add (disabled by default) option to skip the 'cache_tree_update()' at the
end of 'unpack_trees()'. In many cases, this cache tree update is redundant
because the caller of 'unpack_trees()' immediately follows it with
'prime_cache_tree()', rebuilding the entire cache tree from scratch. While
these operations aren't the most expensive part of operations like 'git
reset', the duplicate calls still create a minor unnecessary slowdown.

Introduce an option for callers to skip the 'cache_tree_update()' in
'unpack_trees()' if it is redundant (that is, if 'prime_cache_tree()' is
called afterwards). At the moment, no 'unpack_trees()' callers use the new
option; they will be updated in subsequent patches.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 3 ++-
 unpack-trees.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index bae812156c4..8a762aa0772 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2043,7 +2043,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		if (!ret) {
 			if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
 				cache_tree_verify(the_repository, &o->result);
-			if (!cache_tree_fully_valid(o->result.cache_tree))
+			if (!o->skip_cache_tree_update &&
+			    !cache_tree_fully_valid(o->result.cache_tree))
 				cache_tree_update(&o->result,
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
diff --git a/unpack-trees.h b/unpack-trees.h
index efb9edfbb27..6ab0d74c84d 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -71,7 +71,8 @@ struct unpack_trees_options {
 		     quiet,
 		     exiting_early,
 		     show_all_errors,
-		     dry_run;
+		     dry_run,
+		     skip_cache_tree_update;
 	enum unpack_trees_reset_type reset;
 	const char *prefix;
 	int cache_bottom;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 3/5] reset: use 'skip_cache_tree_update' option
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
@ 2022-11-10 19:06     ` Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 4/5] read-tree: " Victoria Dye via GitGitGadget
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10 19:06 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	szeder.dev, Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Enable the 'skip_cache_tree_update' option in the variants that call
'prime_cache_tree()' after 'unpack_trees()' (specifically, 'git reset
--mixed' and 'git reset --hard'). This avoids redundantly rebuilding the
cache tree in both 'cache_tree_update()' at the end of 'unpack_trees()' and
in 'prime_cache_tree()', resulting in a small (but consistent) performance
improvement. From the newly-added 'p7102-reset.sh' test:

Test                         before            after
--------------------------------------------------------------------
7102.1: reset --hard (...)   2.11(0.40+1.54)   1.97(0.38+1.47) -6.6%

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c       |  2 ++
 t/perf/p7102-reset.sh | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)
 create mode 100755 t/perf/p7102-reset.sh

diff --git a/builtin/reset.c b/builtin/reset.c
index fdce6f8c856..ab027774824 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -73,9 +73,11 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	case HARD:
 		opts.update = 1;
 		opts.reset = UNPACK_RESET_OVERWRITE_UNTRACKED;
+		opts.skip_cache_tree_update = 1;
 		break;
 	case MIXED:
 		opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+		opts.skip_cache_tree_update = 1;
 		/* but opts.update=0, so working tree not updated */
 		break;
 	default:
diff --git a/t/perf/p7102-reset.sh b/t/perf/p7102-reset.sh
new file mode 100755
index 00000000000..9b039e8691f
--- /dev/null
+++ b/t/perf/p7102-reset.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+test_description='performance of reset'
+. ./perf-lib.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+test_perf 'reset --hard with change in tree' '
+	base=$(git rev-parse HEAD) &&
+	test_commit --no-tag A &&
+	new=$(git rev-parse HEAD) &&
+
+	for i in $(test_seq 10)
+	do
+		git reset --hard $new &&
+		git reset --hard $base || return $?
+	done
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 4/5] read-tree: use 'skip_cache_tree_update' option
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-11-10 19:06     ` [PATCH v3 3/5] reset: use " Victoria Dye via GitGitGadget
@ 2022-11-10 19:06     ` Victoria Dye via GitGitGadget
  2022-11-10 19:06     ` [PATCH v3 5/5] rebase: " Victoria Dye via GitGitGadget
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10 19:06 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	szeder.dev, Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

When running 'read-tree' with a single tree and no prefix,
'prime_cache_tree()' is called after the tree is unpacked. In that
situation, skip a redundant call to 'cache_tree_update()' in
'unpack_trees()' by enabling the 'skip_cache_tree_update' unpack option.

Removing the redundant cache tree update provides a substantial performance
improvement to 'git read-tree <tree-ish>', as shown by a test added to
'p0006-read-tree-checkout.sh':

Test                          before            after
----------------------------------------------------------------------
read-tree br_ballast_plus_1   3.94(1.80+1.57)   3.00(1.14+1.28) -23.9%

Note that the 'read-tree' in 't1022-read-tree-partial-clone.sh' is updated
to read two trees, rather than one. The test was first introduced in
d3da223f221 (cache-tree: prefetch in partial clone read-tree, 2021-07-23) to
exercise the 'cache_tree_update()' code path, as used in 'git merge'. Since
this patch drops the call to 'cache_tree_update()' in single-tree 'git
read-tree', change the test to use the two-tree variant so that
'cache_tree_update()' is called as intended.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/read-tree.c                | 4 ++++
 t/perf/p0006-read-tree-checkout.sh | 8 ++++++++
 t/t1022-read-tree-partial-clone.sh | 2 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index f4cbe460b97..45c6652444b 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -249,6 +249,10 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	if (opts.debug_unpack)
 		opts.fn = debug_merge;
 
+	/* If we're going to prime_cache_tree later, skip cache tree update */
+	if (nr_trees == 1 && !opts.prefix)
+		opts.skip_cache_tree_update = 1;
+
 	cache_tree_free(&active_cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
diff --git a/t/perf/p0006-read-tree-checkout.sh b/t/perf/p0006-read-tree-checkout.sh
index c481c012d2f..325566e18eb 100755
--- a/t/perf/p0006-read-tree-checkout.sh
+++ b/t/perf/p0006-read-tree-checkout.sh
@@ -49,6 +49,14 @@ test_perf "read-tree br_base br_ballast ($nr_files)" '
 	git read-tree -n -m br_base br_ballast
 '
 
+test_perf "read-tree br_ballast_plus_1 ($nr_files)" '
+	# Run read-tree 100 times for clearer performance results & comparisons
+	for i in  $(test_seq 100)
+	do
+		git read-tree -n -m br_ballast_plus_1 || return 1
+	done
+'
+
 test_perf "switch between br_base br_ballast ($nr_files)" '
 	git checkout -q br_base &&
 	git checkout -q br_ballast
diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
index a9953b6a71c..da539716359 100755
--- a/t/t1022-read-tree-partial-clone.sh
+++ b/t/t1022-read-tree-partial-clone.sh
@@ -19,7 +19,7 @@ test_expect_success 'read-tree in partial clone prefetches in one batch' '
 	git -C server config uploadpack.allowfilter 1 &&
 	git -C server config uploadpack.allowanysha1inwant 1 &&
 	git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
-	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE $TREE &&
 
 	# "done" marks the end of negotiation (once per fetch). Expect that
 	# only one fetch occurs.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 5/5] rebase: use 'skip_cache_tree_update' option
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-11-10 19:06     ` [PATCH v3 4/5] read-tree: " Victoria Dye via GitGitGadget
@ 2022-11-10 19:06     ` Victoria Dye via GitGitGadget
  2022-11-10 19:50     ` [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after SZEDER Gábor
  2022-11-11  2:50     ` Taylor Blau
  6 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-11-10 19:06 UTC (permalink / raw)
  To: git
  Cc: gitster, phillip.wood123, derrickstolee, jonathantanmy,
	szeder.dev, Taylor Blau, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Enable the 'skip_cache_tree_update' option in both 'do_reset()'
('sequencer.c') and 'reset_head()' ('reset.c'). Both of these callers invoke
'prime_cache_tree()' after 'unpack_trees()', so we can remove an unnecessary
cache tree rebuild by skipping 'cache_tree_update()'.

When testing with 'p3400-rebase.sh' and 'p3404-rebase-interactive.sh', the
performance change of this update was negligible, likely due to the
operation being dominated by more expensive operations (like checking out
trees). However, since the change doesn't harm performance, it's worth
keeping this 'unpack_trees()' usage consistent with others that subsequently
invoke 'prime_cache_tree()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 reset.c     | 1 +
 sequencer.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/reset.c b/reset.c
index e3383a93343..5ded23611f3 100644
--- a/reset.c
+++ b/reset.c
@@ -128,6 +128,7 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	unpack_tree_opts.skip_cache_tree_update = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
 	if (reset_hard)
 		unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
diff --git a/sequencer.c b/sequencer.c
index e658df7e8ff..3f7a73ce4e1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3750,6 +3750,7 @@ static int do_reset(struct repository *r,
 	unpack_tree_opts.merge = 1;
 	unpack_tree_opts.update = 1;
 	unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
+	unpack_tree_opts.skip_cache_tree_update = 1;
 	init_checkout_metadata(&unpack_tree_opts.meta, name, &oid, NULL);
 
 	if (repo_read_index_unmerged(r)) {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-11-10 19:06     ` [PATCH v3 5/5] rebase: " Victoria Dye via GitGitGadget
@ 2022-11-10 19:50     ` SZEDER Gábor
  2022-11-10 20:54       ` Victoria Dye
  2022-11-11  2:50     ` Taylor Blau
  6 siblings, 1 reply; 31+ messages in thread
From: SZEDER Gábor @ 2022-11-10 19:50 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau, Victoria Dye

On Thu, Nov 10, 2022 at 07:06:00PM +0000, Victoria Dye via GitGitGadget wrote:
> Changes since V2
> ================
> 
>  * Cleaned up option handling & provided more informative error messages in
>    'test-tool cache-tree'. The changes don't affect any behavior in the
>    added tests & 'test-tool cache-tree' won't be used outside of
>    development, but the improvements here will help future readers avoid
>    propagating error-prone implementations.
>    * Note that the suggestion to change the "unknown subcommand" error to a
>      'usage()' error was not taken, as it would be somewhat cumbersome to
>      use a formatted string with it.

I'm not sure I understand what's cumbersome.  It's as simple as:

   if (...) {
       error(_("unknown subcommand: `%s'"), argv[0]);
       usage_with_options(test_cache_tree_usage, options);
   }

>      This is in line with other custom
>      subcommand parsing in Git, such as in 'fsmonitor--daemon.c'.

The option parsing in 'fsmonitor--daemon.c' is broken, please don't
consider it as an example to follow.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-10 19:50     ` [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after SZEDER Gábor
@ 2022-11-10 20:54       ` Victoria Dye
  0 siblings, 0 replies; 31+ messages in thread
From: Victoria Dye @ 2022-11-10 20:54 UTC (permalink / raw)
  To: SZEDER Gábor, Victoria Dye via GitGitGadget
  Cc: git, gitster, phillip.wood123, derrickstolee, jonathantanmy,
	Taylor Blau

SZEDER Gábor wrote:
> On Thu, Nov 10, 2022 at 07:06:00PM +0000, Victoria Dye via GitGitGadget wrote:
>> Changes since V2
>> ================
>>
>>  * Cleaned up option handling & provided more informative error messages in
>>    'test-tool cache-tree'. The changes don't affect any behavior in the
>>    added tests & 'test-tool cache-tree' won't be used outside of
>>    development, but the improvements here will help future readers avoid
>>    propagating error-prone implementations.
>>    * Note that the suggestion to change the "unknown subcommand" error to a
>>      'usage()' error was not taken, as it would be somewhat cumbersome to
>>      use a formatted string with it.
> 
> I'm not sure I understand what's cumbersome.  It's as simple as:
> 
>    if (...) {
>        error(_("unknown subcommand: `%s'"), argv[0]);
>        usage_with_options(test_cache_tree_usage, options);
>    }

To be honest, the cumbersome approach I was thinking of was 'sprintf()'-ing
the subcommand into the string and calling 'usage()' with that - your
suggestion is certainly much simpler. However, as a matter of personal
preference, I still think the 'die()' is sufficient in the context of this
test helper (especially given that other test helpers do the same).

> 
>>      This is in line with other custom
>>      subcommand parsing in Git, such as in 'fsmonitor--daemon.c'.
> 
> The option parsing in 'fsmonitor--daemon.c' is broken, please don't
> consider it as an example to follow.

While I understand your desire to helpfully guide users, I don't see
anything to suggest that particular example is "broken." The error conveys
the cause of the problem to a user, who could then run without arguments (or
with -h) to see what the valid subcommands are. And, in the case of this
test helper, I'm not particularly concerned with perfecting the (already
subjective) user experience, given that it's an internal-only tool.

If there are examples of proper usage patterns that future commands should
follow, I'd recommend updating 'CodingGuidelines' and/or
'MyFirstContribution' to mention them. Codifying recommendations like that
can help avoid churn in reviews and, long-term, push Git to align on a
uniform style.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-11-10 19:50     ` [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after SZEDER Gábor
@ 2022-11-11  2:50     ` Taylor Blau
  2022-11-14  0:08       ` Derrick Stolee
  6 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2022-11-11  2:50 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, gitster, phillip.wood123, derrickstolee, jonathantanmy,
	szeder.dev, Taylor Blau, Victoria Dye

On Thu, Nov 10, 2022 at 07:06:00PM +0000, Victoria Dye via GitGitGadget wrote:
> Changes since V2
> ================
>
>  * Cleaned up option handling & provided more informative error messages in
>    'test-tool cache-tree'. The changes don't affect any behavior in the
>    added tests & 'test-tool cache-tree' won't be used outside of
>    development, but the improvements here will help future readers avoid
>    propagating error-prone implementations.
>    * Note that the suggestion to change the "unknown subcommand" error to a
>      'usage()' error was not taken, as it would be somewhat cumbersome to
>      use a formatted string with it. This is in line with other custom
>      subcommand parsing in Git, such as in 'fsmonitor--daemon.c'.

Thanks. The range-diff confirms what you say above. So between that and
an affirmative review on the last round, I think we are ready to start
merging this one down.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
  2022-11-11  2:50     ` Taylor Blau
@ 2022-11-14  0:08       ` Derrick Stolee
  0 siblings, 0 replies; 31+ messages in thread
From: Derrick Stolee @ 2022-11-14  0:08 UTC (permalink / raw)
  To: Taylor Blau, Victoria Dye via GitGitGadget
  Cc: git, gitster, phillip.wood123, jonathantanmy, szeder.dev,
	Victoria Dye

On 11/10/22 9:50 PM, Taylor Blau wrote:
> On Thu, Nov 10, 2022 at 07:06:00PM +0000, Victoria Dye via GitGitGadget wrote:
>> Changes since V2
>> ================
>>
>>  * Cleaned up option handling & provided more informative error messages in
>>    'test-tool cache-tree'. The changes don't affect any behavior in the
>>    added tests & 'test-tool cache-tree' won't be used outside of
>>    development, but the improvements here will help future readers avoid
>>    propagating error-prone implementations.
>>    * Note that the suggestion to change the "unknown subcommand" error to a
>>      'usage()' error was not taken, as it would be somewhat cumbersome to
>>      use a formatted string with it. This is in line with other custom
>>      subcommand parsing in Git, such as in 'fsmonitor--daemon.c'.
> 
> Thanks. The range-diff confirms what you say above. So between that and
> an affirmative review on the last round, I think we are ready to start
> merging this one down.

I agree. This version still LGTM.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-11-14  0:08 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10  7:23   ` SZEDER Gábor
2022-11-08 22:44 ` [PATCH 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-09 15:23 ` [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Derrick Stolee
2022-11-09 22:18   ` Victoria Dye
2022-11-10 14:44     ` Derrick Stolee
2022-11-09 23:01 ` Taylor Blau
2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-10 14:40     ` Phillip Wood
2022-11-10 18:19       ` Victoria Dye
2022-11-10  2:12   ` [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Taylor Blau
2022-11-10 17:26   ` Derrick Stolee
2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-10 19:50     ` [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after SZEDER Gábor
2022-11-10 20:54       ` Victoria Dye
2022-11-11  2:50     ` Taylor Blau
2022-11-14  0:08       ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).