From: Elijah Newren <newren@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>,
Jonathan Nieder <jrnieder@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>,
Taylor Blau <me@ttaylorr.com>,
gitster@pobox.com, Elijah Newren <newren@gmail.com>
Subject: [PATCH v4 0/3] And so it begins...merge/rename performance work
Date: Sat, 23 Jan 2021 22:01:09 -0800 [thread overview]
Message-ID: <20210124060112.1258291-1-newren@gmail.com> (raw)
In-Reply-To: <20210115192958.3336755-1-newren@gmail.com>
This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.
Changes since v3:
* Add a couple preliminary patches needed for later performance work,
including fixing a big memory leak (in some series that has already
been merged to master, I should have included a small additional
section of code from my 'ort' branch, but I overlooked it).
* Update the performance numbers in the commit message of the final
patch based on the changes; the memory leak makes a noticeable
difference on the overall timings since it basically represents
a combination of all the data allocated during the merge algorithm.
Elijah Newren (3):
merge-ort: fix massive leak
merge-ort: ignore the directory rename split conflict for now
merge-ort: begin performance work; instrument with trace2_region_*
calls
diffcore-rename.c | 8 +++++
merge-ort.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 94 insertions(+), 1 deletion(-)
Range-diff:
-: ---------- > 1: 549a63cd2a merge-ort: fix massive leak
-: ---------- > 2: 817a197dbc merge-ort: ignore the directory rename split conflict for now
1: 36d1f87d05 ! 3: 7f7d4a3e17 merge-ort: begin performance work; instrument with trace2_region_* calls
@@ Commit message
Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):
- merge-recursive merge-ort
- no-renames: 18.912 s ± 0.174 s 12.975 s ± 0.037 s
- mega-renames: 5964.031 s ± 10.459 s 5154.338 s ± 19.139 s
- just-one-mega: 149.583 s ± 0.751 s 146.703 s ± 0.852 s
+ merge-recursive merge-ort
+ no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s
+ mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s
+ just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s
A single re-run of each with some breakdowns:
- --- no-renames ---
- merge-recursive merge-ort
- overall runtime: 19.302 s 13.017 s
- inexact rename detection: 7.603 s 7.695 s
- everything else: 11.699 s 5.322 s
+ --- no-renames ---
+ merge-recursive merge-ort
+ overall runtime: 19.302 s 14.257 s
+ inexact rename detection: 7.603 s 7.906 s
+ everything else: 11.699 s 6.351 s
- --- mega-renames ---
- merge-recursive merge-ort
- overall runtime: 5950.195 s 5132.851 s
- inexact rename detection: 5746.309 s 5119.215 s
- everything else: 203.886 s 13.636 s
+ --- mega-renames ---
+ merge-recursive merge-ort
+ overall runtime: 5950.195 s 5499.672 s
+ inexact rename detection: 5746.309 s 5487.120 s
+ everything else: 203.886 s 17.552 s
- --- just-one-mega ---
- merge-recursive merge-ort
- overall runtime: 151.001 s 146.478 s
- inexact rename detection: 143.448 s 145.901 s
- everything else: 7.553 s 0.577 s
+ --- just-one-mega ---
+ merge-recursive merge-ort
+ overall runtime: 151.001 s 158.582 s
+ inexact rename detection: 143.448 s 157.835 s
+ everything else: 7.553 s 0.747 s
=== Timing observations ===
+ 0) Maximum speedup
+
+ The "everything else" row represents the maximum speedup we could
+ achieve if we were to somehow infinitely parallelize inexact rename
+ detection, but leave everything else alone. The fact that this is so
+ much smaller than the real runtime (even in the case with virtually no
+ renames) makes it clear just how overwhelmingly large the time spent on
+ rename detection can be.
+
1) no-renames
1a) merge-ort is faster than merge-recursive, which is nice. However,
@@ Commit message
2a) Obviously rename detection is a huge cost; it's where most the time
is spent. We need to cut that down. If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
- drop to about 204s, and the merge-ort time would drop to about 14s. I
+ drop to about 204s, and the merge-ort time would drop to about 17s. I
think this particular stat shows I've subtly baked a couple performance
- improvements into merge-ort[A] (one of them large) and into
- fast-rebase[B] already.
-
- [A] Avoid quadratic behavior with O(N) insertions or removals
- of entries in the index & avoid unconditional dropping and
- re-reading of the index
- [B] Avoid updating the on-disk index or the working directory
- for intermediate patches -- only update at the end
-
- 2b) rename-detection is somehow ~10% cheaper for merge-ort than
- merge-recursive. This was and is a big surprise to me. Both of them
- call diff_tree_oid() and diffcore_std() with the EXACT same inputs. I
- don't have an explanation, but it is very consistent even after
- re-running many times. Interestingly, the rename detection for the
- first patch is more expensive (just barely) for merge-ort than
- merge-recursive, and that is also consistent. I won't investigate this
- further, as I'm just going to focus on 1a & 2a.
+ improvements into merge-ort and into fast-rebase already.
3) just-one-mega
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
assert(opt->branch1 && opt->branch2);
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
- assert(opt->obuf.len == 0);
-
- assert(opt->priv == NULL);
+ assert(!opt->priv->toplevel_dir ||
+ 0 == strlen(opt->priv->toplevel_dir));
+ }
+ trace2_region_leave("merge", "sanity checks", opt->repo);
/* Default to histogram diff. Actually, just hardcode it...for now. */
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
/* Initialization of opt->priv, our internal merge data */
+ trace2_region_enter("merge", "allocate/init", opt->repo);
- opt->priv = xcalloc(1, sizeof(*opt->priv));
-
- /* Initialization of various renames fields */
+ if (opt->priv) {
+ clear_or_reinit_internal_opts(opt->priv, 1);
+ trace2_region_leave("merge", "allocate/init", opt->repo);
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
* subset of the overall paths that have special output.
*/
--
2.30.0.135.g7f7d4a3e17
next prev parent reply other threads:[~2021-01-24 6:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-08 20:51 [PATCH 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-08 20:59 ` Taylor Blau
2021-01-08 21:50 ` Elijah Newren
2021-01-08 21:55 ` Taylor Blau
2021-01-09 0:52 ` Elijah Newren
2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-13 22:11 ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-14 19:08 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
2021-01-14 20:18 ` Elijah Newren
2021-01-15 19:29 ` [PATCH v3 " Elijah Newren
2021-01-15 19:29 ` [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-24 6:01 ` Elijah Newren [this message]
2021-01-24 6:01 ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
2021-01-24 19:11 ` Derrick Stolee
2021-01-24 6:01 ` [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now Elijah Newren
2021-01-24 6:01 ` [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210124060112.1258291-1-newren@gmail.com \
--to=newren@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).