git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <dstolee@microsoft.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Taylor Blau" <me@ttaylorr.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH 0/8] Optimization batch 10: avoid detecting even more irrelevant renames
Date: Sat, 13 Mar 2021 22:22:00 +0000	[thread overview]
Message-ID: <pull.853.git.1615674128.gitgitgadget@gmail.com> (raw)

This series depends on ort-perf-batch-9.

=== Basic Optimization idea ===

This series adds additional special cases where detection of renames is
irrelevant, where the irrelevance is due to the fact that the merge
machinery will arrive at the same result regardless of whether a rename is
detected for any of those paths. That high level wording makes it sound the
same as ort-perf-batch-9, and basically it is, it's just trying to take the
optimization a step further.

As noted in the last series, there are two reasons that the merge machinery
needs renames:

 * in order to do three-way content merging (pairing appropriate files)
 * in order to find where directories have been renamed

ort-perf-batch-9 provided a rough approximation for the second criteria that
was good enough, but which still left us detecting more renames than
necessary. This series focuses further on that criteria and finds ways to
avoid the need to detect as many renames while still detecting directory
renames identically to before. Thus, this series is an improvement on
"Optimization #2" from my Git Merge 2020 talk[1].

=== Results ===

For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance
work; instrument with trace2_region_* calls", 2020-10-28), the changes in
just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.680 s ±  0.096 s     5.665 s ±  0.129 s 
mega-renames:     13.812 s ±  0.162 s    11.435 s ±  0.158 s
just-one-mega:   506.0  ms ±  3.9  ms   494.2  ms ±  6.1  ms


While those results may look somewhat meager, it is important to note that
the previous optimizations have already reduced rename detection time to
nearly 0 for these particular testcases so there just isn't much left to
improve. The final patch in the series shows an alternate testcase where the
previous optimizations aren't as effective (a simple cherry-pick of a commit
that simply adds one new empty file), where there was a speedup factor of
approximately 3 due to this series:

                     Before Series           After Series
pick-empty:        1.936 s ±  0.024 s     688.1 ms ±  4.2 ms


There was also another testcase at $DAYJOB where I saw a factor 7
improvement from this particular optimization, so it certainly has the
potential to help when the previous optimizations are not quite enough.

As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with (as noted in the same commit message)
were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


[1]
https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf

Elijah Newren (8):
  diffcore-rename: take advantage of "majority rules" to skip more
    renames
  merge-ort, diffcore-rename: tweak dirs_removed and relevant_source
    type
  merge-ort: record the reason that we want a rename for a directory
  diffcore-rename: only compute dir_rename_count for relevant
    directories
  diffcore-rename: check if we have enough renames for directories early
    on
  diffcore-rename: add computation of number of unknown renames
  merge-ort: record the reason that we want a rename for a file
  diffcore-rename: determine which relevant_sources are no longer
    relevant

 diffcore-rename.c | 230 ++++++++++++++++++++++++++++++++++++++++------
 diffcore.h        |  19 +++-
 merge-ort.c       |  79 ++++++++++++----
 3 files changed, 281 insertions(+), 47 deletions(-)


base-commit: 98b0c7de5e70d62d47c3eeb3d290c6a234214f40
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-853%2Fnewren%2Fort-perf-batch-10-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-853/newren/ort-perf-batch-10-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/853
-- 
gitgitgadget

             reply	other threads:[~2021-03-13 22:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-13 22:22 Elijah Newren via GitGitGadget [this message]
2021-03-13 22:22 ` [PATCH 1/8] diffcore-rename: take advantage of "majority rules" to skip more renames Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 2/8] merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 3/8] merge-ort: record the reason that we want a rename for a directory Elijah Newren via GitGitGadget
2021-03-15 14:31   ` Derrick Stolee
2021-03-15 15:27     ` Elijah Newren
2021-03-28  2:01       ` Junio C Hamano
2021-03-13 22:22 ` [PATCH 4/8] diffcore-rename: only compute dir_rename_count for relevant directories Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 5/8] diffcore-rename: check if we have enough renames for directories early on Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 6/8] diffcore-rename: add computation of number of unknown renames Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 7/8] merge-ort: record the reason that we want a rename for a file Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 8/8] diffcore-rename: determine which relevant_sources are no longer relevant Elijah Newren via GitGitGadget
2021-03-15 15:21 ` [PATCH 0/8] Optimization batch 10: avoid detecting even more irrelevant renames Derrick Stolee
2021-03-15 15:34   ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.853.git.1615674128.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).