git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	gitster@pobox.com, Elijah Newren <newren@gmail.com>
Subject: [PATCH v4 0/3] And so it begins...merge/rename performance work
Date: Sat, 23 Jan 2021 22:01:09 -0800	[thread overview]
Message-ID: <20210124060112.1258291-1-newren@gmail.com> (raw)
In-Reply-To: <20210115192958.3336755-1-newren@gmail.com>

This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.

Changes since v3:
  * Add a couple preliminary patches needed for later performance work,
    including fixing a big memory leak (in some series that has already
    been merged to master, I should have included a small additional
    section of code from my 'ort' branch, but I overlooked it).
  * Update the performance numbers in the commit message of the final
    patch based on the changes; the memory leak makes a noticeable
    difference on the overall timings since it basically represents
    a combination of all the data allocated during the merge algorithm.

Elijah Newren (3):
  merge-ort: fix massive leak
  merge-ort: ignore the directory rename split conflict for now
  merge-ort: begin performance work; instrument with trace2_region_*
    calls

 diffcore-rename.c |  8 +++++
 merge-ort.c       | 87 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 94 insertions(+), 1 deletion(-)

Range-diff:
-:  ---------- > 1:  549a63cd2a merge-ort: fix massive leak
-:  ---------- > 2:  817a197dbc merge-ort: ignore the directory rename split conflict for now
1:  36d1f87d05 ! 3:  7f7d4a3e17 merge-ort: begin performance work; instrument with trace2_region_* calls
    @@ Commit message
         Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
         10 runs for the other two cases):
     
    -                      merge-recursive         merge-ort
    -      no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
    -      mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
    -      just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s
    +                           merge-recursive           merge-ort
    +        no-renames:       18.912 s ±  0.174 s    14.263 s ±  0.053 s
    +        mega-renames:   5964.031 s ± 10.459 s  5504.231 s ±  5.150 s
    +        just-one-mega:   149.583 s ±  0.751 s   158.534 s ±  0.498 s
     
         A single re-run of each with some breakdowns:
     
    -                                      ---  no-renames  ---
    -                                merge-recursive   merge-ort
    -      overall runtime:              19.302 s        13.017 s
    -      inexact rename detection:      7.603 s         7.695 s
    -      everything else:              11.699 s         5.322 s
    +                                        ---  no-renames  ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:              19.302 s        14.257 s
    +        inexact rename detection:      7.603 s         7.906 s
    +        everything else:              11.699 s         6.351 s
     
    -                                      --- mega-renames ---
    -                                merge-recursive   merge-ort
    -      overall runtime:            5950.195 s      5132.851 s
    -      inexact rename detection:   5746.309 s      5119.215 s
    -      everything else:             203.886 s        13.636 s
    +                                        --- mega-renames ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:            5950.195 s      5499.672 s
    +        inexact rename detection:   5746.309 s      5487.120 s
    +        everything else:             203.886 s        17.552 s
     
    -                                      --- just-one-mega ---
    -                                merge-recursive   merge-ort
    -      overall runtime:             151.001 s       146.478 s
    -      inexact rename detection:    143.448 s       145.901 s
    -      everything else:               7.553 s         0.577 s
    +                                        --- just-one-mega ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:             151.001 s       158.582 s
    +        inexact rename detection:    143.448 s       157.835 s
    +        everything else:               7.553 s         0.747 s
     
         === Timing observations ===
     
    +    0) Maximum speedup
    +
    +    The "everything else" row represents the maximum speedup we could
    +    achieve if we were to somehow infinitely parallelize inexact rename
    +    detection, but leave everything else alone.  The fact that this is so
    +    much smaller than the real runtime (even in the case with virtually no
    +    renames) makes it clear just how overwhelmingly large the time spent on
    +    rename detection can be.
    +
         1) no-renames
     
         1a) merge-ort is faster than merge-recursive, which is nice.  However,
    @@ Commit message
         2a) Obviously rename detection is a huge cost; it's where most the time
         is spent.  We need to cut that down.  If we could somehow infinitely
         parallelize it and drive its time to 0, the merge-recursive time would
    -    drop to about 204s, and the merge-ort time would drop to about 14s.  I
    +    drop to about 204s, and the merge-ort time would drop to about 17s.  I
         think this particular stat shows I've subtly baked a couple performance
    -    improvements into merge-ort[A] (one of them large) and into
    -    fast-rebase[B] already.
    -
    -        [A] Avoid quadratic behavior with O(N) insertions or removals
    -            of entries in the index & avoid unconditional dropping and
    -            re-reading of the index
    -        [B] Avoid updating the on-disk index or the working directory
    -            for intermediate patches -- only update at the end
    -
    -    2b) rename-detection is somehow ~10% cheaper for merge-ort than
    -    merge-recursive.  This was and is a big surprise to me.  Both of them
    -    call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
    -    don't have an explanation, but it is very consistent even after
    -    re-running many times.  Interestingly, the rename detection for the
    -    first patch is more expensive (just barely) for merge-ort than
    -    merge-recursive, and that is also consistent.  I won't investigate this
    -    further, as I'm just going to focus on 1a & 2a.
    +    improvements into merge-ort and into fast-rebase already.
     
         3) just-one-mega
     
    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
      
      	assert(opt->branch1 && opt->branch2);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
    - 	assert(opt->obuf.len == 0);
    - 
    - 	assert(opt->priv == NULL);
    + 		assert(!opt->priv->toplevel_dir ||
    + 		       0 == strlen(opt->priv->toplevel_dir));
    + 	}
     +	trace2_region_leave("merge", "sanity checks", opt->repo);
      
      	/* Default to histogram diff.  Actually, just hardcode it...for now. */
    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
      
      	/* Initialization of opt->priv, our internal merge data */
     +	trace2_region_enter("merge", "allocate/init", opt->repo);
    - 	opt->priv = xcalloc(1, sizeof(*opt->priv));
    - 
    - 	/* Initialization of various renames fields */
    + 	if (opt->priv) {
    + 		clear_or_reinit_internal_opts(opt->priv, 1);
    + 		trace2_region_leave("merge", "allocate/init", opt->repo);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
      	 * subset of the overall paths that have special output.
      	 */
-- 
2.30.0.135.g7f7d4a3e17


  parent reply	other threads:[~2021-01-24  6:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 20:51 [PATCH 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-08 20:59   ` Taylor Blau
2021-01-08 21:50     ` Elijah Newren
2021-01-08 21:55       ` Taylor Blau
2021-01-09  0:52         ` Elijah Newren
2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-13 22:11   ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-14 19:08   ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
2021-01-14 20:18     ` Elijah Newren
2021-01-15 19:29   ` [PATCH v3 " Elijah Newren
2021-01-15 19:29     ` [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-24  6:01     ` Elijah Newren [this message]
2021-01-24  6:01       ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
2021-01-24 19:11         ` Derrick Stolee
2021-01-24  6:01       ` [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now Elijah Newren
2021-01-24  6:01       ` [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210124060112.1258291-1-newren@gmail.com \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).