git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, phillip.wood123@gmail.com,
	derrickstolee@github.com, jonathantanmy@google.com,
	szeder.dev@gmail.com, Taylor Blau <me@ttaylorr.com>,
	Victoria Dye <vdye@github.com>
Subject: [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
Date: Thu, 10 Nov 2022 19:06:00 +0000	[thread overview]
Message-ID: <pull.1411.v3.git.1668107165.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1411.v2.git.1668045438.gitgitgadget@gmail.com>

Following up on a discussion [1] around cache tree refreshes in 'git reset',
this series updates callers of 'unpack_trees()' to skip its internal
invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
expensive operation, and it is redundant when 'prime_cache_tree()' clears
and rebuilds the cache tree from scratch immediately after.

The first patch adds a test directly comparing the execution time of
'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
that on a fully-valid cache tree, they perform the same, but on a partially-
or fully-invalid cache tree (the more likely case in commands with the
aforementioned redundancy), 'prime_cache_tree()' is faster.

The second patch introduces the 'skip_cache_tree_update' option for
'unpack_trees()', but does not use it yet.

The remaining three patches update callers that make the aforementioned
redundant cache tree updates. The performance impact on these callers ranges
from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
details can be found in the commit messages of the patch associated with the
affected code path.


Changes since V2
================

 * Cleaned up option handling & provided more informative error messages in
   'test-tool cache-tree'. The changes don't affect any behavior in the
   added tests & 'test-tool cache-tree' won't be used outside of
   development, but the improvements here will help future readers avoid
   propagating error-prone implementations.
   * Note that the suggestion to change the "unknown subcommand" error to a
     'usage()' error was not taken, as it would be somewhat cumbersome to
     use a formatted string with it. This is in line with other custom
     subcommand parsing in Git, such as in 'fsmonitor--daemon.c'.


Changes since V1
================

 * Rewrote 'p0090' to more accurately and reliably test 'prime_cache_tree()'
   vs. 'cache_tree_update()'.
   * Moved iterative cache tree update out of C and into the shell tests (to
     avoid potential runtime optimizations)
   * Added a "control" test to document how much of the execution time is
     startup overhead
   * Added tests demonstrating performance in partially-invalid cache trees.
 * Fixed the use of 'prime_cache_tree()' in 'test-tool cache-tree', changing
   it from using the tree at HEAD to the current cache tree.

Thanks!

 * Victoria

[1] https://lore.kernel.org/git/xmqqlf30edvf.fsf@gitster.g/ [2]
https://lore.kernel.org/git/f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com/

Victoria Dye (5):
  cache-tree: add perf test comparing update and prime
  unpack-trees: add 'skip_cache_tree_update' option
  reset: use 'skip_cache_tree_update' option
  read-tree: use 'skip_cache_tree_update' option
  rebase: use 'skip_cache_tree_update' option

 Makefile                           |  1 +
 builtin/read-tree.c                |  4 ++
 builtin/reset.c                    |  2 +
 reset.c                            |  1 +
 sequencer.c                        |  1 +
 t/helper/test-cache-tree.c         | 64 ++++++++++++++++++++++++++++++
 t/helper/test-tool.c               |  1 +
 t/helper/test-tool.h               |  1 +
 t/perf/p0006-read-tree-checkout.sh |  8 ++++
 t/perf/p0090-cache-tree.sh         | 36 +++++++++++++++++
 t/perf/p7102-reset.sh              | 21 ++++++++++
 t/t1022-read-tree-partial-clone.sh |  2 +-
 unpack-trees.c                     |  3 +-
 unpack-trees.h                     |  3 +-
 14 files changed, 145 insertions(+), 3 deletions(-)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh
 create mode 100755 t/perf/p7102-reset.sh


base-commit: 3b08839926fcc7cc48cf4c759737c1a71af430c1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1411%2Fvdye%2Ffeature%2Fcache-tree-optimization-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1411/vdye/feature/cache-tree-optimization-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1411

Range-diff vs v2:

 1:  833519d87c8 ! 1:  2b48a684156 cache-tree: add perf test comparing update and prime
     @@ Commit message
          partially invalid (e.g., 'git reset --hard'), 'prime_cache_tree()' will
          likely perform better than 'cache_tree_update()' in typical cases.
      
     +    Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Makefile ##
     @@ t/helper/test-cache-tree.c (new)
      +
      +	setup_git_directory();
      +
     -+	parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);
     ++	argc = parse_options(argc, argv, NULL, options, test_cache_tree_usage, 0);
      +
      +	if (read_cache() < 0)
     -+		die("unable to read index file");
     ++		die(_("unable to read index file"));
      +
      +	oidcpy(&oid, &the_index.cache_tree->oid);
      +	tree = parse_tree_indirect(&oid);
     @@ t/helper/test-cache-tree.c (new)
      +			cache_tree_invalidate_path(&the_index, the_index.cache[i * interval]->name);
      +	}
      +
     -+	if (!argc)
     -+		die("Must specify subcommand");
     ++	if (argc != 1)
     ++		usage_with_options(test_cache_tree_usage, options);
      +	else if (!strcmp(argv[0], "prime"))
      +		prime_cache_tree(the_repository, &the_index, tree);
      +	else if (!strcmp(argv[0], "update"))
      +		cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
      +	/* use "control" subcommand to specify no-op */
      +	else if (!!strcmp(argv[0], "control"))
     -+		die("Unknown command %s", argv[0]);
     ++		die(_("Unhandled subcommand '%s'"), argv[0]);
      +
      +	return 0;
      +}
 2:  b015a4f531c = 2:  0e03614f0fd unpack-trees: add 'skip_cache_tree_update' option
 3:  4f6039971b8 = 3:  386f18ca36a reset: use 'skip_cache_tree_update' option
 4:  5a646bc47c9 = 4:  ea5c82ce992 read-tree: use 'skip_cache_tree_update' option
 5:  fffe2fc17ed = 5:  100c01e936c rebase: use 'skip_cache_tree_update' option

-- 
gitgitgadget

  parent reply	other threads:[~2022-11-10 19:10 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10  7:23   ` SZEDER Gábor
2022-11-08 22:44 ` [PATCH 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-09 15:23 ` [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Derrick Stolee
2022-11-09 22:18   ` Victoria Dye
2022-11-10 14:44     ` Derrick Stolee
2022-11-09 23:01 ` Taylor Blau
2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-10 14:40     ` Phillip Wood
2022-11-10 18:19       ` Victoria Dye
2022-11-10  2:12   ` [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Taylor Blau
2022-11-10 17:26   ` Derrick Stolee
2022-11-10 19:06   ` Victoria Dye via GitGitGadget [this message]
2022-11-10 19:06     ` [PATCH v3 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-10 19:50     ` [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after SZEDER Gábor
2022-11-10 20:54       ` Victoria Dye
2022-11-11  2:50     ` Taylor Blau
2022-11-14  0:08       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1411.v3.git.1668107165.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=phillip.wood123@gmail.com \
    --cc=szeder.dev@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).