git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v2 0/7] Optimization batch 14: trivial directory resolution
Date: Tue, 13 Jul 2021 19:32:56 +0000	[thread overview]
Message-ID: <pull.988.v2.git.1626204784.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.988.git.1625111177.gitgitgadget@gmail.com>

[This is mostly unchanged since v1; I'm primarily resending since it's been
two weeks and I want to bump it so folks have a chance to notice and review
it. But there was one change, noted below.]

This series depends textually on ort-perf-batch-12, but is semantically
independent. (It is both semantically and textually independent of
ort-perf-batch-13.)

Most of my previous series dramatically accelerated cases with lots of
renames, while providing comparatively minor benefits for cases with few or
no renames. This series is the opposite; it provides huge benefits when
there are few or no renames, and comparatively smaller (though still quite
decent) benefits for cases with many uncached renames.

Changes since v1:

 * Minor tweak to the final patch to correct implicit assumption that rename
   detection running implies all renames were found (rename limits could
   have been exceeded and prevented finding renames)

=== Basic Optimization idea ===

unpack_trees has had a concept of trivial merges for individual files (see
Documentation/technical/trivial-merge.txt). The same idea can be applied in
merge-ort. It'd be really nice to extend that idea to trees as well, as it
could provide a huge performance boost; sadly however, applying it in
general would wreck both regular rename detection (the unmatched side can
have new files that serve as potential destinations in rename detection) and
directory rename detection (the unmatched side could have a new directory
that was moved into it).

If we somehow knew rename detection wasn't needed, we could do trivial
directory resolution. In the past, this wasn't possible. However...

With recent optimizations we have created a possibility to do trivial
directory resolutions in some cases. These came from the addition of the
"skipping irrelevant renames" optimizations (from ort-perf-batch-9 and
ort-perf-batch-10), and in particular noting that we added an ability to
entirely skip rename detection in commit f89b4f2bee ("merge-ort: skip rename
detection entirely if possible", 2021-03-11) when there are no relevant
sources. We can detect if there are no relevant sources without recursing
into the directories in question.

As a cherry on top, the caching of renames (from ort-perf-batch-11) allows
us to cover additional cases.

This series is all about adding all the special checks needed to safely
perform trival directory resolutions.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.235 s ±  0.042 s   204.2  ms ±  3.0  ms
mega-renames:      9.419 s ±  0.107 s     1.076 s ±  0.015 s
just-one-mega:   480.1  ms ±  3.9  ms   364.1  ms ±  7.0  ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with (for merge-recursive as of git-2.30.0)
were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


Elijah Newren (7):
  merge-ort: resolve paths early when we have sufficient information
  merge-ort: add some more explanations in collect_merge_info_callback()
  merge-ort: add data structures for allowable trivial directory
    resolves
  merge-ort: add a handle_deferred_entries() helper function
  merge-ort: defer recursing into directories when merge base is matched
  merge-ort: avoid recursing into directories when we don't need to
  merge-ort: restart merge with cached renames to reduce process entry
    cost

 merge-ort.c                         | 408 +++++++++++++++++++++++++++-
 t/t6423-merge-rename-directories.sh |   2 +-
 2 files changed, 398 insertions(+), 12 deletions(-)


base-commit: 2eeee12b02e441ac05054a5a5ecbcea6964a1e6b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-988%2Fnewren%2Fort-perf-batch-14-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-988/newren/ort-perf-batch-14-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/988

Range-diff vs v1:

 1:  5dca982c0b = 1:  5dca982c0b merge-ort: resolve paths early when we have sufficient information
 2:  8aea371390 = 2:  8aea371390 merge-ort: add some more explanations in collect_merge_info_callback()
 3:  f7ac01055d = 3:  f7ac01055d merge-ort: add data structures for allowable trivial directory resolves
 4:  7e28323b62 = 4:  7e28323b62 merge-ort: add a handle_deferred_entries() helper function
 5:  317553eadb = 5:  317553eadb merge-ort: defer recursing into directories when merge base is matched
 6:  3409a6cd63 = 6:  3409a6cd63 merge-ort: avoid recursing into directories when we don't need to
 7:  76bc73262c ! 7:  7133f0efa5 merge-ort: restart merge with cached renames to reduce process entry cost
     @@ merge-ort.c: static void detect_regular_renames(struct merge_options *opt,
       	}
       
       	partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]);
     +@@ merge-ort.c: static void detect_regular_renames(struct merge_options *opt,
     + 	trace2_region_leave("diff", "diffcore_rename", opt->repo);
     + 	resolve_diffpair_statuses(&diff_queued_diff);
     + 
     ++	if (diff_opts.needed_rename_limit > 0)
     ++		renames->redo_after_renames = 0;
     + 	if (diff_opts.needed_rename_limit > renames->needed_limit)
     + 		renames->needed_limit = diff_opts.needed_rename_limit;
     + 
      @@ merge-ort.c: static void detect_regular_renames(struct merge_options *opt,
       	diff_queued_diff.nr = 0;
       	diff_queued_diff.queue = NULL;
       	diff_flush(&diff_opts);
      +
     -+	if (renames->redo_after_renames) {
     -+		int i;
     -+		struct diff_filepair *p;
     -+
     -+		renames->redo_after_renames = 2;
     -+		for (i = 0; i < renames->pairs[side_index].nr; ++i) {
     -+			p = renames->pairs[side_index].queue[i];
     -+			possibly_cache_new_pair(renames, p, side_index, NULL);
     -+		}
     -+	}
      +	return 1;
       }
       
     @@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
      +	detection_run |= detect_regular_renames(opt, MERGE_SIDE1);
      +	detection_run |= detect_regular_renames(opt, MERGE_SIDE2);
      +	if (renames->redo_after_renames && detection_run) {
     ++		int i, side;
     ++		struct diff_filepair *p;
     ++
     ++		/* Cache the renames, we found */
     ++		for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) {
     ++			for (i = 0; i < renames->pairs[side].nr; ++i) {
     ++				p = renames->pairs[side].queue[i];
     ++				possibly_cache_new_pair(renames, p, side, NULL);
     ++			}
     ++		}
     ++
     ++		/* Restart the merge with the cached renames */
     ++		renames->redo_after_renames = 2;
      +		trace2_region_leave("merge", "regular renames", opt->repo);
      +		goto cleanup;
      +	}

-- 
gitgitgadget

  parent reply	other threads:[~2021-07-13 19:33 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  3:46 [PATCH 0/7] Optimization batch 14: trivial directory resolution Elijah Newren via GitGitGadget
2021-07-01  3:46 ` [PATCH 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-01  3:46 ` [PATCH 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-01  3:46 ` [PATCH 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-01  3:46 ` [PATCH 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-01  3:46 ` [PATCH 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-01  3:46 ` [PATCH 6/7] merge-ort: avoid recursing into directories when we don't need to Elijah Newren via GitGitGadget
2021-07-01  3:46 ` [PATCH 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget
2021-07-01 13:21 ` [PATCH 0/7] Optimization batch 14: trivial directory resolution Ævar Arnfjörð Bjarmason
2021-07-01 15:04   ` Elijah Newren
2021-07-01 19:22     ` Elijah Newren
2021-07-13 19:32 ` Elijah Newren via GitGitGadget [this message]
2021-07-13 19:32   ` [PATCH v2 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-13 19:32   ` [PATCH v2 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-13 23:34     ` Bagas Sanjaya
2021-07-14  0:19       ` Elijah Newren
2021-07-13 19:32   ` [PATCH v2 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-15 13:54     ` Derrick Stolee
2021-07-15 15:54       ` Elijah Newren
2021-07-13 19:33   ` [PATCH v2 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-15 14:32     ` Derrick Stolee
2021-07-15 15:59       ` Elijah Newren
2021-07-13 19:33   ` [PATCH v2 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-15 14:43     ` Derrick Stolee
2021-07-15 16:03       ` Elijah Newren
2021-07-15 17:14         ` Derrick Stolee
2021-07-13 19:33   ` [PATCH v2 6/7] merge-ort: avoid recursing into directories when we don't need to Elijah Newren via GitGitGadget
2021-07-15 14:55     ` Derrick Stolee
2021-07-15 16:28       ` Elijah Newren
2021-07-13 19:33   ` [PATCH v2 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget
2021-07-15 15:09     ` Derrick Stolee
2021-07-15 16:53       ` Elijah Newren
2021-07-15 17:19         ` Derrick Stolee
2021-07-15 17:32           ` Elijah Newren
2021-07-16  5:22   ` [PATCH v3 0/7] Optimization batch 14: trivial directory resolution Elijah Newren via GitGitGadget
2021-07-16  5:22     ` [PATCH v3 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-16  5:22     ` [PATCH v3 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-16  5:22     ` [PATCH v3 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-16  5:22     ` [PATCH v3 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-16  5:22     ` [PATCH v3 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-16  5:22     ` [PATCH v3 6/7] merge-ort: avoid recursing into directories when we don't need to Elijah Newren via GitGitGadget
2021-07-16  5:22     ` [PATCH v3 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget
2021-07-20 13:00     ` [PATCH v3 0/7] Optimization batch 14: trivial directory resolution Derrick Stolee
2021-07-20 21:43       ` Junio C Hamano
2021-07-21  4:23     ` [PATCH v4 " Elijah Newren via GitGitGadget
2021-07-21  4:23       ` [PATCH v4 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-21  4:23       ` [PATCH v4 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-21  4:24       ` [PATCH v4 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-21  4:24       ` [PATCH v4 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-21  4:24       ` [PATCH v4 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-21  4:24       ` [PATCH v4 6/7] merge-ort: avoid recursing into directories when we don't need to Elijah Newren via GitGitGadget
2021-07-21  4:24       ` [PATCH v4 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.988.v2.git.1626204784.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).