git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] And so it begins...merge/rename performance work
@ 2021-01-08 20:51 Elijah Newren
  2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
  2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
  0 siblings, 2 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-08 20:51 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, Elijah Newren

This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.

This series begins the performance work for merge-ort and
diffcore-rename.  This series only has one patch and all it does is add
trace2_region enter/leave pairs -- but it comes with a lengthy commit
message detailing my driving testcases, the current status, and my
plans.  Part of the point of the lengthy testcase description is it will
allow me to repeatedly refer to it in subsequent series' commit messages
with a paragraph of the form:

    For the testcases mentioned in commit 9542932eee ("merge-ort: begin
    performance work; instrument with trace2_region_* calls", 2020-10-28),
    this change improves the performance as follows:
    
                                  Before                  After
          no-renames:       12.975 s ±  0.037 s    12.904 s ±  0.069 s
          mega-renames:   5154.338 s ± 19.139 s  1670.582 s ±  0.904 s
          just-one-mega:   146.703 s ±  0.852 s    48.149 s ±  0.306 s


Elijah Newren (1):
  merge-ort: begin performance work; instrument with trace2_region_*
    calls

 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

-- 
2.29.1.107.g69489f3566


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-08 20:51 [PATCH 0/1] And so it begins...merge/rename performance work Elijah Newren
@ 2021-01-08 20:51 ` Elijah Newren
  2021-01-08 20:59   ` Taylor Blau
  2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
  1 sibling, 1 reply; 17+ messages in thread
From: Elijah Newren @ 2021-01-08 20:51 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, Elijah Newren

Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.

=== Setup ===

The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version.  I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic.  The setup is
as follows:

  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots  # Introduce over 26,000 renames
  $ git commit -m "Rename drivers/ to pilots/"

=== Testcases ===

Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:

Testcase #1: no-renames

  $ git checkout v5.4^0
  $ REBASE --onto HEAD base hwmon-updates

  Note: technically the name is misleading; there are some renames, but
  very few.  Rename detection only takes about half the overall time.

Testcase #2: mega-renames

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-updates

Testcase #3: just-one-mega

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-just-one

=== Timing results ===

Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):

                  merge-recursive         merge-ort
  no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
  mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
  just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s

A single re-run of each with some breakdowns:

                                  ---  no-renames  ---
                            merge-recursive   merge-ort
  overall runtime:              19.302 s        13.017 s
  inexact rename detection:      7.603 s         7.695 s
  everything else:              11.699 s         5.322 s

                                  --- mega-renames ---
                            merge-recursive   merge-ort
  overall runtime:            5950.195 s      5132.851 s
  inexact rename detection:   5746.309 s      5119.215 s
  everything else:             203.886 s        13.636 s

                                  --- just-one-mega ---
                            merge-recursive   merge-ort
  overall runtime:             151.001 s       146.478 s
  inexact rename detection:    143.448 s       145.901 s
  everything else:               7.553 s         0.577 s

=== Timing observations ===

1) no-renames

1a) merge-ort is faster than merge-recursive, which is nice.  However,
this still should not be considered good enough.  Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not.  In fact, even
merge-ort is slower.  The "apply" backend can complete this testcase in
    6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive.  One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.

2) mega-renames

2a) Obviously rename detection is a huge cost; it's where most the time
is spent.  We need to cut that down.  If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 14s.  I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort[A] (one of them large) and into
fast-rebase[B] already.

    [A] Avoid quadratic behavior with O(N) insertions or removals
	of entries in the index & avoid unconditional dropping and
        re-reading of the index
    [B] Avoid updating the on-disk index or the working directory
        for intermediate patches -- only update at the end

2b) rename-detection is somehow ~10% cheaper for merge-ort than
merge-recursive.  This was and is a big surprise to me.  Both of them
call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
don't have an explanation, but it is very consistent even after
re-running many times.  Interestingly, the rename detection for the
first patch is more expensive (just barely) for merge-ort than
merge-recursive, and that is also consistent.  I won't investigate this
further, as I'm just going to focus on 1a & 2a.

3) just-one-mega

3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.

=== Goals ===

This patch is obviously just the beginning.  Here are some of my goals
that this measurement will help us achieve:

* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
  factors (which would have previously been overshadowed by rename
  detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
  is noticeably faster with merge-ort than with apply-based rebase.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 90db9ebd6d..8fe6c9384b 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -465,6 +465,7 @@ void diffcore_rename(struct diff_options *options)
 	int num_destinations, dst_cnt;
 	struct progress *progress = NULL;
 
+	trace2_region_enter("diff", "setup", options->repo);
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
 
@@ -510,14 +511,17 @@ void diffcore_rename(struct diff_options *options)
 			register_rename_src(p);
 		}
 	}
+	trace2_region_leave("diff", "setup", options->repo);
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
 		goto cleanup; /* nothing to do */
 
+	trace2_region_enter("diff", "exact renames", options->repo);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
 	rename_count = find_exact_renames(options);
+	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -545,6 +549,7 @@ void diffcore_rename(struct diff_options *options)
 		break;
 	}
 
+	trace2_region_enter("diff", "inexact renames", options->repo);
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
@@ -600,11 +605,13 @@ void diffcore_rename(struct diff_options *options)
 	if (detect_rename == DIFF_DETECT_COPY)
 		rename_count += find_renames(mx, dst_cnt, minimum_score, 1);
 	free(mx);
+	trace2_region_leave("diff", "inexact renames", options->repo);
 
  cleanup:
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+	trace2_region_enter("diff", "write back to queue", options->repo);
 	DIFF_QUEUE_CLEAR(&outq);
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -680,5 +687,6 @@ void diffcore_rename(struct diff_options *options)
 		strintmap_clear(break_idx);
 		FREE_AND_NULL(break_idx);
 	}
+	trace2_region_leave("diff", "write back to queue", options->repo);
 	return;
 }
diff --git a/merge-ort.c b/merge-ort.c
index 8f4ca4fe83..0f3ad78f3c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -752,7 +752,9 @@ static int collect_merge_info(struct merge_options *opt,
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
 
+	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
+	trace2_region_leave("merge", "traverse_trees", opt->repo);
 
 	return ret;
 }
@@ -2095,9 +2097,12 @@ static void detect_regular_renames(struct merge_options *opt,
 	diff_opts.show_rename_progress = opt->show_rename_progress;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	diff_setup_done(&diff_opts);
+
+	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
 		      &diff_opts);
 	diffcore_std(&diff_opts);
+	trace2_region_leave("diff", "diffcore_rename", opt->repo);
 
 	if (diff_opts.needed_rename_limit > renames->needed_limit)
 		renames->needed_limit = diff_opts.needed_rename_limit;
@@ -2196,9 +2201,12 @@ static int detect_and_process_renames(struct merge_options *opt,
 
 	memset(&combined, 0, sizeof(combined));
 
+	trace2_region_enter("merge", "regular renames", opt->repo);
 	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
 	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
+	trace2_region_leave("merge", "regular renames", opt->repo);
 
+	trace2_region_enter("merge", "directory renames", opt->repo);
 	need_dir_renames =
 	  !opt->priv->call_depth &&
 	  (opt->detect_directory_renames == MERGE_DIRECTORY_RENAMES_TRUE ||
@@ -2220,8 +2228,11 @@ static int detect_and_process_renames(struct merge_options *opt,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
 	QSORT(combined.queue, combined.nr, compare_pairs);
+	trace2_region_leave("merge", "directory renames", opt->repo);
 
+	trace2_region_enter("merge", "process renames", opt->repo);
 	clean &= process_renames(opt, &combined);
+	trace2_region_leave("merge", "process renames", opt->repo);
 
 	/* Free memory for renames->pairs[] and combined */
 	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
@@ -2903,20 +2914,30 @@ static void process_entries(struct merge_options *opt,
 						   STRING_LIST_INIT_NODUP,
 						   NULL, 0 };
 
+	trace2_region_enter("merge", "process_entries setup", opt->repo);
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
 	/* Hack to pre-allocate plist to the desired size */
+	trace2_region_enter("merge", "plist grow", opt->repo);
 	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+	trace2_region_leave("merge", "plist grow", opt->repo);
 
 	/* Put every entry from paths into plist, then sort */
+	trace2_region_enter("merge", "plist copy", opt->repo);
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
 		string_list_append(&plist, e->key)->util = e->value;
 	}
+	trace2_region_leave("merge", "plist copy", opt->repo);
+
+	trace2_region_enter("merge", "plist special sort", opt->repo);
 	plist.cmp = string_list_df_name_compare;
 	string_list_sort(&plist);
+	trace2_region_leave("merge", "plist special sort", opt->repo);
+
+	trace2_region_leave("merge", "process_entries setup", opt->repo);
 
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
@@ -2927,6 +2948,7 @@ static void process_entries(struct merge_options *opt,
 	 * (because it allows us to know whether the directory is still in
 	 * the way when it is time to process the file at the same path).
 	 */
+	trace2_region_enter("merge", "processing", opt->repo);
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
 		/*
@@ -2945,7 +2967,9 @@ static void process_entries(struct merge_options *opt,
 			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
+	trace2_region_leave("merge", "processing", opt->repo);
 
+	trace2_region_enter("merge", "process_entries cleanup", opt->repo);
 	if (dir_metadata.offsets.nr != 1 ||
 	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
 		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
@@ -2960,6 +2984,7 @@ static void process_entries(struct merge_options *opt,
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
 	string_list_clear(&dir_metadata.offsets, 0);
+	trace2_region_leave("merge", "process_entries cleanup", opt->repo);
 }
 
 /*** Function Grouping: functions related to merge_switch_to_result() ***/
@@ -3118,12 +3143,15 @@ void merge_switch_to_result(struct merge_options *opt,
 	if (result->clean >= 0 && update_worktree_and_index) {
 		struct merge_options_internal *opti = result->priv;
 
+		trace2_region_enter("merge", "checkout", opt->repo);
 		if (checkout(opt, head, result->tree)) {
 			/* failure to function */
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "checkout", opt->repo);
 
+		trace2_region_enter("merge", "record_conflicted", opt->repo);
 		if (record_conflicted_index_entries(opt, opt->repo->index,
 						    &opti->paths,
 						    &opti->conflicted)) {
@@ -3131,6 +3159,7 @@ void merge_switch_to_result(struct merge_options *opt,
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "record_conflicted", opt->repo);
 	}
 
 	if (display_update_msgs) {
@@ -3140,6 +3169,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		struct string_list olist = STRING_LIST_INIT_NODUP;
 		int i;
 
+		trace2_region_enter("merge", "display messages", opt->repo);
+
 		/* Hack to pre-allocate olist to the desired size */
 		ALLOC_GROW(olist.items, strmap_get_size(&opti->output),
 			   olist.alloc);
@@ -3161,6 +3192,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		/* Also include needed rename limit adjustment now */
 		diff_warn_rename_limit("merge.renamelimit",
 				       opti->renames.needed_limit, 0);
+
+		trace2_region_leave("merge", "display messages", opt->repo);
 	}
 
 	merge_finalize(opt, result);
@@ -3202,6 +3235,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	int i;
 
 	/* Sanity checks on opt */
+	trace2_region_enter("merge", "sanity checks", opt->repo);
 	assert(opt->repo);
 
 	assert(opt->branch1 && opt->branch2);
@@ -3228,11 +3262,13 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	assert(opt->obuf.len == 0);
 
 	assert(opt->priv == NULL);
+	trace2_region_leave("merge", "sanity checks", opt->repo);
 
 	/* Default to histogram diff.  Actually, just hardcode it...for now. */
 	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
 
 	/* Initialization of opt->priv, our internal merge data */
+	trace2_region_enter("merge", "allocate/init", opt->repo);
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
 	/* Initialization of various renames fields */
@@ -3265,6 +3301,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * subset of the overall paths that have special output.
 	 */
 	strmap_init(&opt->priv->output);
+
+	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
 /*** Function Grouping: merge_incore_*() and their internal variants ***/
@@ -3280,6 +3318,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
+	trace2_region_enter("merge", "collect_merge_info", opt->repo);
 	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
 		/*
 		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
@@ -3292,10 +3331,16 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 		result->clean = -1;
 		return;
 	}
+	trace2_region_leave("merge", "collect_merge_info", opt->repo);
 
+	trace2_region_enter("merge", "renames", opt->repo);
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
+	trace2_region_leave("merge", "renames", opt->repo);
+
+	trace2_region_enter("merge", "process_entries", opt->repo);
 	process_entries(opt, &working_tree_oid);
+	trace2_region_leave("merge", "process_entries", opt->repo);
 
 	/* Set return values */
 	result->tree = parse_tree_indirect(&working_tree_oid);
@@ -3396,9 +3441,15 @@ void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_nonrecursive", opt->repo);
+
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	assert(opt->ancestor != NULL);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
+	trace2_region_leave("merge", "incore_nonrecursive", opt->repo);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
@@ -3407,9 +3458,15 @@ void merge_incore_recursive(struct merge_options *opt,
 			    struct commit *side2,
 			    struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_recursive", opt->repo);
+
 	/* We set the ancestor label based on the merge_bases */
 	assert(opt->ancestor == NULL);
 
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_internal(opt, merge_bases, side1, side2, result);
+	trace2_region_leave("merge", "incore_recursive", opt->repo);
 }
-- 
2.29.1.107.g69489f3566


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
@ 2021-01-08 20:59   ` Taylor Blau
  2021-01-08 21:50     ` Elijah Newren
  0 siblings, 1 reply; 17+ messages in thread
From: Taylor Blau @ 2021-01-08 20:59 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau

On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> 10 runs for the other two cases):

Ah, I love hyperfine. In case you don't already have this in your
arsenal, the following `--prepare` step is useful for measuring
cold-cache performance:

    --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

> === Goals ===
>
> This patch is obviously just the beginning.  Here are some of my goals
> that this measurement will help us achieve:
>
> * Drive the cost of rename detection down considerably for merges
> * After the above has been achieved, see if there are other slowness
>   factors (which would have previously been overshadowed by rename
>   detection costs) which we can then focus on and also optimize.
> * Ensure our rebase testcase that requires little rename detection
>   is noticeably faster with merge-ort than with apply-based rebase.

These are great, and I am looking forward to your work.

> Signed-off-by: Elijah Newren <newren@gmail.com>

Thanks, this patch looks good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-08 20:59   ` Taylor Blau
@ 2021-01-08 21:50     ` Elijah Newren
  2021-01-08 21:55       ` Taylor Blau
  0 siblings, 1 reply; 17+ messages in thread
From: Elijah Newren @ 2021-01-08 21:50 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Git Mailing List, Derrick Stolee, Jeff King, Jonathan Nieder,
	Jonathan Tan, Taylor Blau

On Fri, Jan 8, 2021 at 12:59 PM Taylor Blau <ttaylorr@github.com> wrote:
>
> On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> > Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> > 10 runs for the other two cases):
>
> Ah, I love hyperfine. In case you don't already have this in your
> arsenal, the following `--prepare` step is useful for measuring
> cold-cache performance:
>
>     --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

/proc/sys/vm/drop_caches is definitely useful for cold-cache
measurements and I've used it in other projects for that purpose.  I
think cold-cache testing makes sense for various I/O intensive areas
such as object lookup, but I ignored it here as I felt the merge code
is really about algorithmic performance.  So, I instead went the other
direction and ensured warm-cache testing by using a warmup run, in
order to ensure that I wasn't putting one of the tests at an unfair
disadvantage.  (Side note: My script that runs the tests actually does
more than the warmup run to ensure a fair playing field.  For example,
the script expires reflogs and runs a git prune before each hyperfine
invocation, to make sure that each hyperfine run starts with a fully
repacked repository with no loose objects.  Without the expire &
prune, enough perf testing of rebases in a short time period will
result in a mysterious and gradual slowdown of all the test runs even
without code changes...).

> > === Goals ===
> >
> > This patch is obviously just the beginning.  Here are some of my goals
> > that this measurement will help us achieve:
> >
> > * Drive the cost of rename detection down considerably for merges
> > * After the above has been achieved, see if there are other slowness
> >   factors (which would have previously been overshadowed by rename
> >   detection costs) which we can then focus on and also optimize.
> > * Ensure our rebase testcase that requires little rename detection
> >   is noticeably faster with merge-ort than with apply-based rebase.
>
> These are great, and I am looking forward to your work.
>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
>
> Thanks, this patch looks good to me.

As always, thanks for taking a look.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-08 21:50     ` Elijah Newren
@ 2021-01-08 21:55       ` Taylor Blau
  2021-01-09  0:52         ` Elijah Newren
  0 siblings, 1 reply; 17+ messages in thread
From: Taylor Blau @ 2021-01-08 21:55 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Derrick Stolee, Jeff King, Jonathan Nieder,
	Jonathan Tan, Taylor Blau

On Fri, Jan 08, 2021 at 01:50:34PM -0800, Elijah Newren wrote:
> On Fri, Jan 8, 2021 at 12:59 PM Taylor Blau <ttaylorr@github.com> wrote:
> >
> > On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> > > Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> > > 10 runs for the other two cases):
> >
> > Ah, I love hyperfine. In case you don't already have this in your
> > arsenal, the following `--prepare` step is useful for measuring
> > cold-cache performance:
> >
> >     --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'
>
> /proc/sys/vm/drop_caches is definitely useful for cold-cache
> measurements and I've used it in other projects for that purpose.  I
> think cold-cache testing makes sense for various I/O intensive areas
> such as object lookup, but I ignored it here as I felt the merge code
> is really about algorithmic performance.

Yes, I agree that the interesting thing here is algorithmic performance
moreso than I/O.

> So, I instead went the other direction and ensured warm-cache testing
> by using a warmup run, in order to ensure that I wasn't putting one of
> the tests at an unfair disadvantage.

I often use it for both. Combining that `--prepare` step with at least
one `--warmup` invocation is useful to make sure that your I/O cache is
warmed only with the things it might want to read during your timing
tests. (Probably one `--warmup` without dumping the cache is fine, since
you will likely end up evicting things out of your cache that you don't
care about, but I digress..)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-08 21:55       ` Taylor Blau
@ 2021-01-09  0:52         ` Elijah Newren
  0 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-09  0:52 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Git Mailing List, Derrick Stolee, Jeff King, Jonathan Nieder,
	Jonathan Tan, Taylor Blau

On Fri, Jan 8, 2021 at 1:55 PM Taylor Blau <ttaylorr@github.com> wrote:
>
> On Fri, Jan 08, 2021 at 01:50:34PM -0800, Elijah Newren wrote:
> > On Fri, Jan 8, 2021 at 12:59 PM Taylor Blau <ttaylorr@github.com> wrote:
> > >
> > > On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> > > > Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> > > > 10 runs for the other two cases):
> > >
> > > Ah, I love hyperfine. In case you don't already have this in your
> > > arsenal, the following `--prepare` step is useful for measuring
> > > cold-cache performance:
> > >
> > >     --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'
> >
> > /proc/sys/vm/drop_caches is definitely useful for cold-cache
> > measurements and I've used it in other projects for that purpose.  I
> > think cold-cache testing makes sense for various I/O intensive areas
> > such as object lookup, but I ignored it here as I felt the merge code
> > is really about algorithmic performance.
>
> Yes, I agree that the interesting thing here is algorithmic performance
> moreso than I/O.
>
> > So, I instead went the other direction and ensured warm-cache testing
> > by using a warmup run, in order to ensure that I wasn't putting one of
> > the tests at an unfair disadvantage.
>
> I often use it for both. Combining that `--prepare` step with at least
> one `--warmup` invocation is useful to make sure that your I/O cache is
> warmed only with the things it might want to read during your timing
> tests. (Probably one `--warmup` without dumping the cache is fine, since
> you will likely end up evicting things out of your cache that you don't
> care about, but I digress..)

Ah, that hadn't occurred to me, but it makes sense.  Thanks for the
tip; I may give it a try at some point.  I worry slightly that it
might increase the run-to-run noise instead of decreasing it since I'm
committing sins by not running the performance tests on a quiet server
but on my laptop with a full GUI running -- a few year old,
nearly-bottom-of-the-line Dell refurbished grade B laptop with spinny
disks.  Dropping disk caches would lower the risk of needing to spend
time evicting other things from the warm cache, but would increase the
risk that some background GUI thing or system daemon needs to read
from the hard disk when it wouldn't have needed to otherwise, and if
the timing of that disk read is unfortunately placed, then it could
slow down I/O I care about.  I guess there's only one way to find out
if it'd help or hurt though...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 0/1] And so it begins...merge/rename performance work
  2021-01-08 20:51 [PATCH 0/1] And so it begins...merge/rename performance work Elijah Newren
  2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
@ 2021-01-13 22:11 ` Elijah Newren
  2021-01-13 22:11   ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-13 22:11 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Elijah Newren

This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.

Changes since v1:
  * Add a step I forgot in my testcase setup -- increasing
    merge.renameLimit
  * Add Acked-by from Taylor

Elijah Newren (1):
  merge-ort: begin performance work; instrument with trace2_region_*
    calls

 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

Range-diff:
1:  9542932eee ! 1:  8783f209ef merge-ort: begin performance work; instrument with trace2_region_* calls
    @@ Commit message
           $ git switch -c 5.4-renames v5.4
           $ git mv drivers pilots  # Introduce over 26,000 renames
           $ git commit -m "Rename drivers/ to pilots/"
    +      $ git config merge.renameLimit 30000
     
         === Testcases ===
     
    @@ Commit message
           is noticeably faster with merge-ort than with apply-based rebase.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
    +    Acked-by: Taylor Blau <ttaylorr@github.com>
     
      ## diffcore-rename.c ##
     @@ diffcore-rename.c: void diffcore_rename(struct diff_options *options)
-- 
2.29.2.544.gecb49aa127.dirty


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
@ 2021-01-13 22:11   ` Elijah Newren
  2021-01-14 19:08   ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
  2021-01-15 19:29   ` [PATCH v3 " Elijah Newren
  2 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-13 22:11 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Elijah Newren, Taylor Blau

Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.

=== Setup ===

The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version.  I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic.  The setup is
as follows:

  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots  # Introduce over 26,000 renames
  $ git commit -m "Rename drivers/ to pilots/"
  $ git config merge.renameLimit 30000

=== Testcases ===

Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:

Testcase #1: no-renames

  $ git checkout v5.4^0
  $ REBASE --onto HEAD base hwmon-updates

  Note: technically the name is misleading; there are some renames, but
  very few.  Rename detection only takes about half the overall time.

Testcase #2: mega-renames

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-updates

Testcase #3: just-one-mega

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-just-one

=== Timing results ===

Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):

                  merge-recursive         merge-ort
  no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
  mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
  just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s

A single re-run of each with some breakdowns:

                                  ---  no-renames  ---
                            merge-recursive   merge-ort
  overall runtime:              19.302 s        13.017 s
  inexact rename detection:      7.603 s         7.695 s
  everything else:              11.699 s         5.322 s

                                  --- mega-renames ---
                            merge-recursive   merge-ort
  overall runtime:            5950.195 s      5132.851 s
  inexact rename detection:   5746.309 s      5119.215 s
  everything else:             203.886 s        13.636 s

                                  --- just-one-mega ---
                            merge-recursive   merge-ort
  overall runtime:             151.001 s       146.478 s
  inexact rename detection:    143.448 s       145.901 s
  everything else:               7.553 s         0.577 s

=== Timing observations ===

1) no-renames

1a) merge-ort is faster than merge-recursive, which is nice.  However,
this still should not be considered good enough.  Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not.  In fact, even
merge-ort is slower.  The "apply" backend can complete this testcase in
    6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive.  One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.

2) mega-renames

2a) Obviously rename detection is a huge cost; it's where most the time
is spent.  We need to cut that down.  If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 14s.  I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort[A] (one of them large) and into
fast-rebase[B] already.

    [A] Avoid quadratic behavior with O(N) insertions or removals
	of entries in the index & avoid unconditional dropping and
        re-reading of the index
    [B] Avoid updating the on-disk index or the working directory
        for intermediate patches -- only update at the end

2b) rename-detection is somehow ~10% cheaper for merge-ort than
merge-recursive.  This was and is a big surprise to me.  Both of them
call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
don't have an explanation, but it is very consistent even after
re-running many times.  Interestingly, the rename detection for the
first patch is more expensive (just barely) for merge-ort than
merge-recursive, and that is also consistent.  I won't investigate this
further, as I'm just going to focus on 1a & 2a.

3) just-one-mega

3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.

=== Goals ===

This patch is obviously just the beginning.  Here are some of my goals
that this measurement will help us achieve:

* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
  factors (which would have previously been overshadowed by rename
  detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
  is noticeably faster with merge-ort than with apply-based rebase.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <ttaylorr@github.com>
---
 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 90db9ebd6d..8fe6c9384b 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -465,6 +465,7 @@ void diffcore_rename(struct diff_options *options)
 	int num_destinations, dst_cnt;
 	struct progress *progress = NULL;
 
+	trace2_region_enter("diff", "setup", options->repo);
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
 
@@ -510,14 +511,17 @@ void diffcore_rename(struct diff_options *options)
 			register_rename_src(p);
 		}
 	}
+	trace2_region_leave("diff", "setup", options->repo);
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
 		goto cleanup; /* nothing to do */
 
+	trace2_region_enter("diff", "exact renames", options->repo);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
 	rename_count = find_exact_renames(options);
+	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -545,6 +549,7 @@ void diffcore_rename(struct diff_options *options)
 		break;
 	}
 
+	trace2_region_enter("diff", "inexact renames", options->repo);
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
@@ -600,11 +605,13 @@ void diffcore_rename(struct diff_options *options)
 	if (detect_rename == DIFF_DETECT_COPY)
 		rename_count += find_renames(mx, dst_cnt, minimum_score, 1);
 	free(mx);
+	trace2_region_leave("diff", "inexact renames", options->repo);
 
  cleanup:
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+	trace2_region_enter("diff", "write back to queue", options->repo);
 	DIFF_QUEUE_CLEAR(&outq);
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -680,5 +687,6 @@ void diffcore_rename(struct diff_options *options)
 		strintmap_clear(break_idx);
 		FREE_AND_NULL(break_idx);
 	}
+	trace2_region_leave("diff", "write back to queue", options->repo);
 	return;
 }
diff --git a/merge-ort.c b/merge-ort.c
index 8f4ca4fe83..0f3ad78f3c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -752,7 +752,9 @@ static int collect_merge_info(struct merge_options *opt,
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
 
+	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
+	trace2_region_leave("merge", "traverse_trees", opt->repo);
 
 	return ret;
 }
@@ -2095,9 +2097,12 @@ static void detect_regular_renames(struct merge_options *opt,
 	diff_opts.show_rename_progress = opt->show_rename_progress;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	diff_setup_done(&diff_opts);
+
+	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
 		      &diff_opts);
 	diffcore_std(&diff_opts);
+	trace2_region_leave("diff", "diffcore_rename", opt->repo);
 
 	if (diff_opts.needed_rename_limit > renames->needed_limit)
 		renames->needed_limit = diff_opts.needed_rename_limit;
@@ -2196,9 +2201,12 @@ static int detect_and_process_renames(struct merge_options *opt,
 
 	memset(&combined, 0, sizeof(combined));
 
+	trace2_region_enter("merge", "regular renames", opt->repo);
 	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
 	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
+	trace2_region_leave("merge", "regular renames", opt->repo);
 
+	trace2_region_enter("merge", "directory renames", opt->repo);
 	need_dir_renames =
 	  !opt->priv->call_depth &&
 	  (opt->detect_directory_renames == MERGE_DIRECTORY_RENAMES_TRUE ||
@@ -2220,8 +2228,11 @@ static int detect_and_process_renames(struct merge_options *opt,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
 	QSORT(combined.queue, combined.nr, compare_pairs);
+	trace2_region_leave("merge", "directory renames", opt->repo);
 
+	trace2_region_enter("merge", "process renames", opt->repo);
 	clean &= process_renames(opt, &combined);
+	trace2_region_leave("merge", "process renames", opt->repo);
 
 	/* Free memory for renames->pairs[] and combined */
 	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
@@ -2903,20 +2914,30 @@ static void process_entries(struct merge_options *opt,
 						   STRING_LIST_INIT_NODUP,
 						   NULL, 0 };
 
+	trace2_region_enter("merge", "process_entries setup", opt->repo);
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
 	/* Hack to pre-allocate plist to the desired size */
+	trace2_region_enter("merge", "plist grow", opt->repo);
 	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+	trace2_region_leave("merge", "plist grow", opt->repo);
 
 	/* Put every entry from paths into plist, then sort */
+	trace2_region_enter("merge", "plist copy", opt->repo);
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
 		string_list_append(&plist, e->key)->util = e->value;
 	}
+	trace2_region_leave("merge", "plist copy", opt->repo);
+
+	trace2_region_enter("merge", "plist special sort", opt->repo);
 	plist.cmp = string_list_df_name_compare;
 	string_list_sort(&plist);
+	trace2_region_leave("merge", "plist special sort", opt->repo);
+
+	trace2_region_leave("merge", "process_entries setup", opt->repo);
 
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
@@ -2927,6 +2948,7 @@ static void process_entries(struct merge_options *opt,
 	 * (because it allows us to know whether the directory is still in
 	 * the way when it is time to process the file at the same path).
 	 */
+	trace2_region_enter("merge", "processing", opt->repo);
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
 		/*
@@ -2945,7 +2967,9 @@ static void process_entries(struct merge_options *opt,
 			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
+	trace2_region_leave("merge", "processing", opt->repo);
 
+	trace2_region_enter("merge", "process_entries cleanup", opt->repo);
 	if (dir_metadata.offsets.nr != 1 ||
 	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
 		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
@@ -2960,6 +2984,7 @@ static void process_entries(struct merge_options *opt,
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
 	string_list_clear(&dir_metadata.offsets, 0);
+	trace2_region_leave("merge", "process_entries cleanup", opt->repo);
 }
 
 /*** Function Grouping: functions related to merge_switch_to_result() ***/
@@ -3118,12 +3143,15 @@ void merge_switch_to_result(struct merge_options *opt,
 	if (result->clean >= 0 && update_worktree_and_index) {
 		struct merge_options_internal *opti = result->priv;
 
+		trace2_region_enter("merge", "checkout", opt->repo);
 		if (checkout(opt, head, result->tree)) {
 			/* failure to function */
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "checkout", opt->repo);
 
+		trace2_region_enter("merge", "record_conflicted", opt->repo);
 		if (record_conflicted_index_entries(opt, opt->repo->index,
 						    &opti->paths,
 						    &opti->conflicted)) {
@@ -3131,6 +3159,7 @@ void merge_switch_to_result(struct merge_options *opt,
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "record_conflicted", opt->repo);
 	}
 
 	if (display_update_msgs) {
@@ -3140,6 +3169,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		struct string_list olist = STRING_LIST_INIT_NODUP;
 		int i;
 
+		trace2_region_enter("merge", "display messages", opt->repo);
+
 		/* Hack to pre-allocate olist to the desired size */
 		ALLOC_GROW(olist.items, strmap_get_size(&opti->output),
 			   olist.alloc);
@@ -3161,6 +3192,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		/* Also include needed rename limit adjustment now */
 		diff_warn_rename_limit("merge.renamelimit",
 				       opti->renames.needed_limit, 0);
+
+		trace2_region_leave("merge", "display messages", opt->repo);
 	}
 
 	merge_finalize(opt, result);
@@ -3202,6 +3235,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	int i;
 
 	/* Sanity checks on opt */
+	trace2_region_enter("merge", "sanity checks", opt->repo);
 	assert(opt->repo);
 
 	assert(opt->branch1 && opt->branch2);
@@ -3228,11 +3262,13 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	assert(opt->obuf.len == 0);
 
 	assert(opt->priv == NULL);
+	trace2_region_leave("merge", "sanity checks", opt->repo);
 
 	/* Default to histogram diff.  Actually, just hardcode it...for now. */
 	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
 
 	/* Initialization of opt->priv, our internal merge data */
+	trace2_region_enter("merge", "allocate/init", opt->repo);
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
 	/* Initialization of various renames fields */
@@ -3265,6 +3301,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * subset of the overall paths that have special output.
 	 */
 	strmap_init(&opt->priv->output);
+
+	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
 /*** Function Grouping: merge_incore_*() and their internal variants ***/
@@ -3280,6 +3318,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
+	trace2_region_enter("merge", "collect_merge_info", opt->repo);
 	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
 		/*
 		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
@@ -3292,10 +3331,16 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 		result->clean = -1;
 		return;
 	}
+	trace2_region_leave("merge", "collect_merge_info", opt->repo);
 
+	trace2_region_enter("merge", "renames", opt->repo);
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
+	trace2_region_leave("merge", "renames", opt->repo);
+
+	trace2_region_enter("merge", "process_entries", opt->repo);
 	process_entries(opt, &working_tree_oid);
+	trace2_region_leave("merge", "process_entries", opt->repo);
 
 	/* Set return values */
 	result->tree = parse_tree_indirect(&working_tree_oid);
@@ -3396,9 +3441,15 @@ void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_nonrecursive", opt->repo);
+
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	assert(opt->ancestor != NULL);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
+	trace2_region_leave("merge", "incore_nonrecursive", opt->repo);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
@@ -3407,9 +3458,15 @@ void merge_incore_recursive(struct merge_options *opt,
 			    struct commit *side2,
 			    struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_recursive", opt->repo);
+
 	/* We set the ancestor label based on the merge_bases */
 	assert(opt->ancestor == NULL);
 
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_internal(opt, merge_bases, side1, side2, result);
+	trace2_region_leave("merge", "incore_recursive", opt->repo);
 }
-- 
2.29.2.544.gecb49aa127.dirty


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/1] And so it begins...merge/rename performance work
  2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
  2021-01-13 22:11   ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
@ 2021-01-14 19:08   ` Junio C Hamano
  2021-01-14 20:18     ` Elijah Newren
  2021-01-15 19:29   ` [PATCH v3 " Elijah Newren
  2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-01-14 19:08 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau

Elijah Newren <newren@gmail.com> writes:

> This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
> and en/ort-directory-rename.

Thanks.

How ready is the foundation to accept this change?  This will depend
on all of the above three topics and I am not sure what the status
of them is---I think that I've read through the diffcore-rename one
and it was a pleasant read, but I do not recall the reviewer
reactions to the other two.

It does not instill a lot of confidence in these topics that nobody
commented on things like [v2 17/17] of en/ort-directory-rename
(which fixes the code introduced in [v2 06/17] instead of fixing it
at the source before 06/17 copies it from elsewhere [*1*]).  It
looks to me like a sign that these collection of series are moving
too fast than any reviewer to catch up.

Thanks.


[Footnote]

*1* I do not particularly think [06/17] that copies and leaves the
    recursive backend behind is necessarily bad.  What I find more
    disturbing is that nobody seems to have a chance to give any
    look to the two iterations of the series (as far as I can see,
    v2 is just in name only---it removed 18/17 in v1 that were not
    meant to be sent), and we are about to build on top of the
    foundation that nobody knows how solid it is.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/1] And so it begins...merge/rename performance work
  2021-01-14 19:08   ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
@ 2021-01-14 20:18     ` Elijah Newren
  0 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-14 20:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Derrick Stolee, Jeff King, Jonathan Nieder,
	Jonathan Tan, Taylor Blau

Hi Junio,

On Thu, Jan 14, 2021 at 11:08 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
> > and en/ort-directory-rename.
>
> Thanks.
>
> How ready is the foundation to accept this change?  This will depend
> on all of the above three topics and I am not sure what the status
> of them is---I think that I've read through the diffcore-rename one
> and it was a pleasant read, but I do not recall the reviewer
> reactions to the other two.

Good questions.  Let me go through the three in order:

en/ort-conflict-handling: Has already been reviewed; v2 included the
fixes to the issues Stolee noted in v1 and Stolee was happy with v2
(https://lore.kernel.org/git/5f6d5428-36ce-3e91-4916-8968ac1b8686@gmail.com/)
as of a week and a half ago.  I am unaware of any issues; I think it's
ready as-is to merge down to next and then to master.

en/diffcore-rename: Taylor and you both reviewed it; Christian skimmed
over it and noted a typo.  I fixed up the issues all three of you
noted by v3 at the end of December.  I was assuming based on your
comments (at https://lore.kernel.org/git/xmqqwnwnglim.fsf@gitster.c.googlers.com/)
that you are happy with it now, which seems to be supported by the
fact that you've already merged it to next.  I think it's ready to
continue merging on to master.

en/ort-directory-rename: This one is the only question mark in my
mind.  Most of the ort patches I wanted folks to review because I'm
doing things significantly differently than merge-recursive.c does.
This series is an exception; patches 1-16 are just porting over the
exact same algorithm from merge-recursive.c using the new data
structures.  It's a lot of code, because directory rename detection
has a lot of logic to it, but there's not anything new or tricky to it
in relation to what's in merge-recursive.c.  However, even though the
algorithm is essentially just being copied, the differences in data
structures means there are lots of tiny changes all over that make a
direct comparison difficult (unless folks are familiar with the old
logic, but I think only Stefan Beller and I were).  Honestly, I'm not
so sure this series is worth reviewers' time given all those details,
with the exception of patch 17/17.  If folks can review everything,
that's great, but if we have limited review resources, I'd rather
people review the series before this one, and the performance related
ones I'll be posting later.  The testcases are pretty thorough in this
area, in part thanks to additional suggestions from Stefan a few years
back, and the similarity of the logic makes me less concerned about
this topic than any of the others in the merge-ort and diffcore-rename
work that I have and will be doing.


In summary:
  * en/ort-conflict-handling and en/diffcore-rename are ready to merge
down (and you already started on one of them)
  * I'm not sure if anyone will review en/ort-directory-rename (it's
been a week and a half with no review so far), but I'm tempted to
encourage people to save their review effort for other topics.
There's just not much different here than what's found in
merge-recursive.c.  (With the exception of patch 17...)

> It does not instill a lot of confidence in these topics that nobody
> commented on things like [v2 17/17] of en/ort-directory-rename
> (which fixes the code introduced in [v2 06/17] instead of fixing it
> at the source before 06/17 copies it from elsewhere [*1*]).  It
> looks to me like a sign that these collection of series are moving
> too fast than any reviewer to catch up.

I considered just moving 17/17 earlier, and even started that way, but
there are a couple problems with that:

1) The comparison of directory rename detection code between
merge-recursive.c and merge-ort.c, if folks want to compare the two,
is much easier if I first implement the same algorithm

2) 06/17 introduces the concept of counting renames from a directory
and picking the highest count; the idea is simple but but with the
extra complications from 17/17 the combined patch just looks too
obtuse to follow done all in one step.  Splitting 17/17 into a
separate later step was something that I felt would make it easier to
review.

3) 06/17 consists of well-understood code from merge-recursive.c.
17/17 is new.  If we ever want to port the fix from merge-ort to
merge-recursive, having it as a separate change will help with that.


So, although it may look weird, I intentionally changed from a
combined early commit, as you seem to be suggesting here, into
splitting it out this way.  I also considered removing 17/17 from the
series and then submitting it later.

> Thanks.
>
>
> [Footnote]
>
> *1* I do not particularly think [06/17] that copies and leaves the
>     recursive backend behind is necessarily bad.  What I find more
>     disturbing is that nobody seems to have a chance to give any
>     look to the two iterations of the series (as far as I can see,
>     v2 is just in name only---it removed 18/17 in v1 that were not
>     meant to be sent), and we are about to build on top of the
>     foundation that nobody knows how solid it is.

What if we dropped 17/17 from en/ort-directory-rename (the
merge/rename performance work doesn't depend upon it), and then merged
that series down?  Without 17/17, the series is just a port of code
that exists in merge-recursive.c that has been battle tested for
years.  I could submit the final patch separately later.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 0/1] And so it begins...merge/rename performance work
  2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
  2021-01-13 22:11   ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
  2021-01-14 19:08   ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
@ 2021-01-15 19:29   ` Elijah Newren
  2021-01-15 19:29     ` [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
  2021-01-24  6:01     ` [PATCH v4 0/3] And so it begins...merge/rename performance work Elijah Newren
  2 siblings, 2 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-15 19:29 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Sangeeta, Elijah Newren

This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.

Changes since v2:
  * Add another step I forgot in my testcase setup -- setting
    merge.directoryRenames (noticed by Sangeeta); I've double checked
    that I didn't forget any other settings.

Elijah Newren (1):
  merge-ort: begin performance work; instrument with trace2_region_*
    calls

 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

Range-diff:
1:  8783f209ef ! 1:  644f458c01 merge-ort: begin performance work; instrument with trace2_region_* calls
    @@ Commit message
           $ git mv drivers pilots  # Introduce over 26,000 renames
           $ git commit -m "Rename drivers/ to pilots/"
           $ git config merge.renameLimit 30000
    +      $ git config merge.directoryRenames true
     
         === Testcases ===
     
-- 
2.29.2.506.ga68ba46ed0.dirty


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-15 19:29   ` [PATCH v3 " Elijah Newren
@ 2021-01-15 19:29     ` Elijah Newren
  2021-01-24  6:01     ` [PATCH v4 0/3] And so it begins...merge/rename performance work Elijah Newren
  1 sibling, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-15 19:29 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Sangeeta, Elijah Newren, Taylor Blau

Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.

=== Setup ===

The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version.  I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic.  The setup is
as follows:

  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots  # Introduce over 26,000 renames
  $ git commit -m "Rename drivers/ to pilots/"
  $ git config merge.renameLimit 30000
  $ git config merge.directoryRenames true

=== Testcases ===

Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:

Testcase #1: no-renames

  $ git checkout v5.4^0
  $ REBASE --onto HEAD base hwmon-updates

  Note: technically the name is misleading; there are some renames, but
  very few.  Rename detection only takes about half the overall time.

Testcase #2: mega-renames

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-updates

Testcase #3: just-one-mega

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-just-one

=== Timing results ===

Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):

                  merge-recursive         merge-ort
  no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
  mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
  just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s

A single re-run of each with some breakdowns:

                                  ---  no-renames  ---
                            merge-recursive   merge-ort
  overall runtime:              19.302 s        13.017 s
  inexact rename detection:      7.603 s         7.695 s
  everything else:              11.699 s         5.322 s

                                  --- mega-renames ---
                            merge-recursive   merge-ort
  overall runtime:            5950.195 s      5132.851 s
  inexact rename detection:   5746.309 s      5119.215 s
  everything else:             203.886 s        13.636 s

                                  --- just-one-mega ---
                            merge-recursive   merge-ort
  overall runtime:             151.001 s       146.478 s
  inexact rename detection:    143.448 s       145.901 s
  everything else:               7.553 s         0.577 s

=== Timing observations ===

1) no-renames

1a) merge-ort is faster than merge-recursive, which is nice.  However,
this still should not be considered good enough.  Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not.  In fact, even
merge-ort is slower.  The "apply" backend can complete this testcase in
    6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive.  One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.

2) mega-renames

2a) Obviously rename detection is a huge cost; it's where most the time
is spent.  We need to cut that down.  If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 14s.  I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort[A] (one of them large) and into
fast-rebase[B] already.

    [A] Avoid quadratic behavior with O(N) insertions or removals
	of entries in the index & avoid unconditional dropping and
        re-reading of the index
    [B] Avoid updating the on-disk index or the working directory
        for intermediate patches -- only update at the end

2b) rename-detection is somehow ~10% cheaper for merge-ort than
merge-recursive.  This was and is a big surprise to me.  Both of them
call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
don't have an explanation, but it is very consistent even after
re-running many times.  Interestingly, the rename detection for the
first patch is more expensive (just barely) for merge-ort than
merge-recursive, and that is also consistent.  I won't investigate this
further, as I'm just going to focus on 1a & 2a.

3) just-one-mega

3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.

=== Goals ===

This patch is obviously just the beginning.  Here are some of my goals
that this measurement will help us achieve:

* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
  factors (which would have previously been overshadowed by rename
  detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
  is noticeably faster with merge-ort than with apply-based rebase.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <ttaylorr@github.com>
---
 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 90db9ebd6d..8fe6c9384b 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -465,6 +465,7 @@ void diffcore_rename(struct diff_options *options)
 	int num_destinations, dst_cnt;
 	struct progress *progress = NULL;
 
+	trace2_region_enter("diff", "setup", options->repo);
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
 
@@ -510,14 +511,17 @@ void diffcore_rename(struct diff_options *options)
 			register_rename_src(p);
 		}
 	}
+	trace2_region_leave("diff", "setup", options->repo);
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
 		goto cleanup; /* nothing to do */
 
+	trace2_region_enter("diff", "exact renames", options->repo);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
 	rename_count = find_exact_renames(options);
+	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -545,6 +549,7 @@ void diffcore_rename(struct diff_options *options)
 		break;
 	}
 
+	trace2_region_enter("diff", "inexact renames", options->repo);
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
@@ -600,11 +605,13 @@ void diffcore_rename(struct diff_options *options)
 	if (detect_rename == DIFF_DETECT_COPY)
 		rename_count += find_renames(mx, dst_cnt, minimum_score, 1);
 	free(mx);
+	trace2_region_leave("diff", "inexact renames", options->repo);
 
  cleanup:
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+	trace2_region_enter("diff", "write back to queue", options->repo);
 	DIFF_QUEUE_CLEAR(&outq);
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -680,5 +687,6 @@ void diffcore_rename(struct diff_options *options)
 		strintmap_clear(break_idx);
 		FREE_AND_NULL(break_idx);
 	}
+	trace2_region_leave("diff", "write back to queue", options->repo);
 	return;
 }
diff --git a/merge-ort.c b/merge-ort.c
index 8f4ca4fe83..0f3ad78f3c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -752,7 +752,9 @@ static int collect_merge_info(struct merge_options *opt,
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
 
+	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
+	trace2_region_leave("merge", "traverse_trees", opt->repo);
 
 	return ret;
 }
@@ -2095,9 +2097,12 @@ static void detect_regular_renames(struct merge_options *opt,
 	diff_opts.show_rename_progress = opt->show_rename_progress;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	diff_setup_done(&diff_opts);
+
+	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
 		      &diff_opts);
 	diffcore_std(&diff_opts);
+	trace2_region_leave("diff", "diffcore_rename", opt->repo);
 
 	if (diff_opts.needed_rename_limit > renames->needed_limit)
 		renames->needed_limit = diff_opts.needed_rename_limit;
@@ -2196,9 +2201,12 @@ static int detect_and_process_renames(struct merge_options *opt,
 
 	memset(&combined, 0, sizeof(combined));
 
+	trace2_region_enter("merge", "regular renames", opt->repo);
 	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
 	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
+	trace2_region_leave("merge", "regular renames", opt->repo);
 
+	trace2_region_enter("merge", "directory renames", opt->repo);
 	need_dir_renames =
 	  !opt->priv->call_depth &&
 	  (opt->detect_directory_renames == MERGE_DIRECTORY_RENAMES_TRUE ||
@@ -2220,8 +2228,11 @@ static int detect_and_process_renames(struct merge_options *opt,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
 	QSORT(combined.queue, combined.nr, compare_pairs);
+	trace2_region_leave("merge", "directory renames", opt->repo);
 
+	trace2_region_enter("merge", "process renames", opt->repo);
 	clean &= process_renames(opt, &combined);
+	trace2_region_leave("merge", "process renames", opt->repo);
 
 	/* Free memory for renames->pairs[] and combined */
 	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
@@ -2903,20 +2914,30 @@ static void process_entries(struct merge_options *opt,
 						   STRING_LIST_INIT_NODUP,
 						   NULL, 0 };
 
+	trace2_region_enter("merge", "process_entries setup", opt->repo);
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
 	/* Hack to pre-allocate plist to the desired size */
+	trace2_region_enter("merge", "plist grow", opt->repo);
 	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+	trace2_region_leave("merge", "plist grow", opt->repo);
 
 	/* Put every entry from paths into plist, then sort */
+	trace2_region_enter("merge", "plist copy", opt->repo);
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
 		string_list_append(&plist, e->key)->util = e->value;
 	}
+	trace2_region_leave("merge", "plist copy", opt->repo);
+
+	trace2_region_enter("merge", "plist special sort", opt->repo);
 	plist.cmp = string_list_df_name_compare;
 	string_list_sort(&plist);
+	trace2_region_leave("merge", "plist special sort", opt->repo);
+
+	trace2_region_leave("merge", "process_entries setup", opt->repo);
 
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
@@ -2927,6 +2948,7 @@ static void process_entries(struct merge_options *opt,
 	 * (because it allows us to know whether the directory is still in
 	 * the way when it is time to process the file at the same path).
 	 */
+	trace2_region_enter("merge", "processing", opt->repo);
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
 		/*
@@ -2945,7 +2967,9 @@ static void process_entries(struct merge_options *opt,
 			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
+	trace2_region_leave("merge", "processing", opt->repo);
 
+	trace2_region_enter("merge", "process_entries cleanup", opt->repo);
 	if (dir_metadata.offsets.nr != 1 ||
 	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
 		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
@@ -2960,6 +2984,7 @@ static void process_entries(struct merge_options *opt,
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
 	string_list_clear(&dir_metadata.offsets, 0);
+	trace2_region_leave("merge", "process_entries cleanup", opt->repo);
 }
 
 /*** Function Grouping: functions related to merge_switch_to_result() ***/
@@ -3118,12 +3143,15 @@ void merge_switch_to_result(struct merge_options *opt,
 	if (result->clean >= 0 && update_worktree_and_index) {
 		struct merge_options_internal *opti = result->priv;
 
+		trace2_region_enter("merge", "checkout", opt->repo);
 		if (checkout(opt, head, result->tree)) {
 			/* failure to function */
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "checkout", opt->repo);
 
+		trace2_region_enter("merge", "record_conflicted", opt->repo);
 		if (record_conflicted_index_entries(opt, opt->repo->index,
 						    &opti->paths,
 						    &opti->conflicted)) {
@@ -3131,6 +3159,7 @@ void merge_switch_to_result(struct merge_options *opt,
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "record_conflicted", opt->repo);
 	}
 
 	if (display_update_msgs) {
@@ -3140,6 +3169,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		struct string_list olist = STRING_LIST_INIT_NODUP;
 		int i;
 
+		trace2_region_enter("merge", "display messages", opt->repo);
+
 		/* Hack to pre-allocate olist to the desired size */
 		ALLOC_GROW(olist.items, strmap_get_size(&opti->output),
 			   olist.alloc);
@@ -3161,6 +3192,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		/* Also include needed rename limit adjustment now */
 		diff_warn_rename_limit("merge.renamelimit",
 				       opti->renames.needed_limit, 0);
+
+		trace2_region_leave("merge", "display messages", opt->repo);
 	}
 
 	merge_finalize(opt, result);
@@ -3202,6 +3235,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	int i;
 
 	/* Sanity checks on opt */
+	trace2_region_enter("merge", "sanity checks", opt->repo);
 	assert(opt->repo);
 
 	assert(opt->branch1 && opt->branch2);
@@ -3228,11 +3262,13 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	assert(opt->obuf.len == 0);
 
 	assert(opt->priv == NULL);
+	trace2_region_leave("merge", "sanity checks", opt->repo);
 
 	/* Default to histogram diff.  Actually, just hardcode it...for now. */
 	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
 
 	/* Initialization of opt->priv, our internal merge data */
+	trace2_region_enter("merge", "allocate/init", opt->repo);
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
 	/* Initialization of various renames fields */
@@ -3265,6 +3301,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * subset of the overall paths that have special output.
 	 */
 	strmap_init(&opt->priv->output);
+
+	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
 /*** Function Grouping: merge_incore_*() and their internal variants ***/
@@ -3280,6 +3318,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
+	trace2_region_enter("merge", "collect_merge_info", opt->repo);
 	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
 		/*
 		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
@@ -3292,10 +3331,16 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 		result->clean = -1;
 		return;
 	}
+	trace2_region_leave("merge", "collect_merge_info", opt->repo);
 
+	trace2_region_enter("merge", "renames", opt->repo);
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
+	trace2_region_leave("merge", "renames", opt->repo);
+
+	trace2_region_enter("merge", "process_entries", opt->repo);
 	process_entries(opt, &working_tree_oid);
+	trace2_region_leave("merge", "process_entries", opt->repo);
 
 	/* Set return values */
 	result->tree = parse_tree_indirect(&working_tree_oid);
@@ -3396,9 +3441,15 @@ void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_nonrecursive", opt->repo);
+
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	assert(opt->ancestor != NULL);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
+	trace2_region_leave("merge", "incore_nonrecursive", opt->repo);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
@@ -3407,9 +3458,15 @@ void merge_incore_recursive(struct merge_options *opt,
 			    struct commit *side2,
 			    struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_recursive", opt->repo);
+
 	/* We set the ancestor label based on the merge_bases */
 	assert(opt->ancestor == NULL);
 
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_internal(opt, merge_bases, side1, side2, result);
+	trace2_region_leave("merge", "incore_recursive", opt->repo);
 }
-- 
2.29.2.506.ga68ba46ed0.dirty


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 0/3] And so it begins...merge/rename performance work
  2021-01-15 19:29   ` [PATCH v3 " Elijah Newren
  2021-01-15 19:29     ` [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
@ 2021-01-24  6:01     ` Elijah Newren
  2021-01-24  6:01       ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
                         ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-24  6:01 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Elijah Newren

This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.

Changes since v3:
  * Add a couple preliminary patches needed for later performance work,
    including fixing a big memory leak (in some series that has already
    been merged to master, I should have included a small additional
    section of code from my 'ort' branch, but I overlooked it).
  * Update the performance numbers in the commit message of the final
    patch based on the changes; the memory leak makes a noticeable
    difference on the overall timings since it basically represents
    a combination of all the data allocated during the merge algorithm.

Elijah Newren (3):
  merge-ort: fix massive leak
  merge-ort: ignore the directory rename split conflict for now
  merge-ort: begin performance work; instrument with trace2_region_*
    calls

 diffcore-rename.c |  8 +++++
 merge-ort.c       | 87 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 94 insertions(+), 1 deletion(-)

Range-diff:
-:  ---------- > 1:  549a63cd2a merge-ort: fix massive leak
-:  ---------- > 2:  817a197dbc merge-ort: ignore the directory rename split conflict for now
1:  36d1f87d05 ! 3:  7f7d4a3e17 merge-ort: begin performance work; instrument with trace2_region_* calls
    @@ Commit message
         Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
         10 runs for the other two cases):
     
    -                      merge-recursive         merge-ort
    -      no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
    -      mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
    -      just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s
    +                           merge-recursive           merge-ort
    +        no-renames:       18.912 s ±  0.174 s    14.263 s ±  0.053 s
    +        mega-renames:   5964.031 s ± 10.459 s  5504.231 s ±  5.150 s
    +        just-one-mega:   149.583 s ±  0.751 s   158.534 s ±  0.498 s
     
         A single re-run of each with some breakdowns:
     
    -                                      ---  no-renames  ---
    -                                merge-recursive   merge-ort
    -      overall runtime:              19.302 s        13.017 s
    -      inexact rename detection:      7.603 s         7.695 s
    -      everything else:              11.699 s         5.322 s
    +                                        ---  no-renames  ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:              19.302 s        14.257 s
    +        inexact rename detection:      7.603 s         7.906 s
    +        everything else:              11.699 s         6.351 s
     
    -                                      --- mega-renames ---
    -                                merge-recursive   merge-ort
    -      overall runtime:            5950.195 s      5132.851 s
    -      inexact rename detection:   5746.309 s      5119.215 s
    -      everything else:             203.886 s        13.636 s
    +                                        --- mega-renames ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:            5950.195 s      5499.672 s
    +        inexact rename detection:   5746.309 s      5487.120 s
    +        everything else:             203.886 s        17.552 s
     
    -                                      --- just-one-mega ---
    -                                merge-recursive   merge-ort
    -      overall runtime:             151.001 s       146.478 s
    -      inexact rename detection:    143.448 s       145.901 s
    -      everything else:               7.553 s         0.577 s
    +                                        --- just-one-mega ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:             151.001 s       158.582 s
    +        inexact rename detection:    143.448 s       157.835 s
    +        everything else:               7.553 s         0.747 s
     
         === Timing observations ===
     
    +    0) Maximum speedup
    +
    +    The "everything else" row represents the maximum speedup we could
    +    achieve if we were to somehow infinitely parallelize inexact rename
    +    detection, but leave everything else alone.  The fact that this is so
    +    much smaller than the real runtime (even in the case with virtually no
    +    renames) makes it clear just how overwhelmingly large the time spent on
    +    rename detection can be.
    +
         1) no-renames
     
         1a) merge-ort is faster than merge-recursive, which is nice.  However,
    @@ Commit message
         2a) Obviously rename detection is a huge cost; it's where most the time
         is spent.  We need to cut that down.  If we could somehow infinitely
         parallelize it and drive its time to 0, the merge-recursive time would
    -    drop to about 204s, and the merge-ort time would drop to about 14s.  I
    +    drop to about 204s, and the merge-ort time would drop to about 17s.  I
         think this particular stat shows I've subtly baked a couple performance
    -    improvements into merge-ort[A] (one of them large) and into
    -    fast-rebase[B] already.
    -
    -        [A] Avoid quadratic behavior with O(N) insertions or removals
    -            of entries in the index & avoid unconditional dropping and
    -            re-reading of the index
    -        [B] Avoid updating the on-disk index or the working directory
    -            for intermediate patches -- only update at the end
    -
    -    2b) rename-detection is somehow ~10% cheaper for merge-ort than
    -    merge-recursive.  This was and is a big surprise to me.  Both of them
    -    call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
    -    don't have an explanation, but it is very consistent even after
    -    re-running many times.  Interestingly, the rename detection for the
    -    first patch is more expensive (just barely) for merge-ort than
    -    merge-recursive, and that is also consistent.  I won't investigate this
    -    further, as I'm just going to focus on 1a & 2a.
    +    improvements into merge-ort and into fast-rebase already.
     
         3) just-one-mega
     
    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
      
      	assert(opt->branch1 && opt->branch2);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
    - 	assert(opt->obuf.len == 0);
    - 
    - 	assert(opt->priv == NULL);
    + 		assert(!opt->priv->toplevel_dir ||
    + 		       0 == strlen(opt->priv->toplevel_dir));
    + 	}
     +	trace2_region_leave("merge", "sanity checks", opt->repo);
      
      	/* Default to histogram diff.  Actually, just hardcode it...for now. */
    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
      
      	/* Initialization of opt->priv, our internal merge data */
     +	trace2_region_enter("merge", "allocate/init", opt->repo);
    - 	opt->priv = xcalloc(1, sizeof(*opt->priv));
    - 
    - 	/* Initialization of various renames fields */
    + 	if (opt->priv) {
    + 		clear_or_reinit_internal_opts(opt->priv, 1);
    + 		trace2_region_leave("merge", "allocate/init", opt->repo);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
      	 * subset of the overall paths that have special output.
      	 */
-- 
2.30.0.135.g7f7d4a3e17


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/3] merge-ort: fix massive leak
  2021-01-24  6:01     ` [PATCH v4 0/3] And so it begins...merge/rename performance work Elijah Newren
@ 2021-01-24  6:01       ` Elijah Newren
  2021-01-24 19:11         ` Derrick Stolee
  2021-01-24  6:01       ` [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now Elijah Newren
  2021-01-24  6:01       ` [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
  2 siblings, 1 reply; 17+ messages in thread
From: Elijah Newren @ 2021-01-24  6:01 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Elijah Newren

When a series of merges was performed (such as for a rebase or series of
cherry-picks), only the data structures allocated by the final merge
operation were being freed.  The problem was that while picking out
pieces of merge-ort to upstream, I previously misread a certain section
of merge_start() and assumed it was associated with a later
optimization.  Include that section now, which ensures that if there was
a previous merge operation, that we clear out result->priv and then
re-use it for opt->priv, and otherwise we allocate opt->priv.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 05c6b2e0dc..b5845ff6e9 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3227,11 +3227,28 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	assert(opt->obuf.len == 0);
 
 	assert(opt->priv == NULL);
+	if (result->priv) {
+		opt->priv = result->priv;
+		result->priv = NULL;
+		/*
+		 * opt->priv non-NULL means we had results from a previous
+		 * run; do a few sanity checks that user didn't mess with
+		 * it in an obvious fashion.
+		 */
+		assert(opt->priv->call_depth == 0);
+		assert(!opt->priv->toplevel_dir ||
+		       0 == strlen(opt->priv->toplevel_dir));
+	}
 
 	/* Default to histogram diff.  Actually, just hardcode it...for now. */
 	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
 
 	/* Initialization of opt->priv, our internal merge data */
+	if (opt->priv) {
+		clear_or_reinit_internal_opts(opt->priv, 1);
+		trace2_region_leave("merge", "allocate/init", opt->repo);
+		return;
+	}
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
 	/* Initialization of various renames fields */
-- 
2.30.0.135.g7f7d4a3e17


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now
  2021-01-24  6:01     ` [PATCH v4 0/3] And so it begins...merge/rename performance work Elijah Newren
  2021-01-24  6:01       ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
@ 2021-01-24  6:01       ` Elijah Newren
  2021-01-24  6:01       ` [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
  2 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-24  6:01 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Elijah Newren

get_provisional_directory_renames() has code to detect directories being
evenly split between different locations.  However, as noted previously,
if there are no new files added to that directory that was split evenly,
our inability to determine where the directory was renamed to doesn't
matter since there are no new files to try to move into the new
location.  Unfortunately, that code is unaware of whether there are new
files under the directory in question and we just ignore that, causing
us to fail t6423 test 2b but pass test 2a; turn off the error for now,
swapping which tests pass and fail.

The motivating reason for switching this off as a temporary measure is
that as we add optimizations, we'll start looking at only subsets of
renames, and subsets of renames can start switching the result we get
when this error is (wrongly) on.  Once we get enough optimizations,
however, we can prevent that code from even running when there are no
new files added to the relevant directory, at which point we can revert
this commit and then both testcases 2a and 2b will pass simultaneously.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index b5845ff6e9..f04fab96d7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1439,7 +1439,18 @@ static void get_provisional_directory_renames(struct merge_options *opt,
 				 "no destination getting a majority of the "
 				 "files."),
 			       source_dir);
-			*clean = 0;
+			/*
+			 * We should mark this as unclean IF something attempts
+			 * to use this rename.  We do not yet have the logic
+			 * in place to detect if this directory rename is being
+			 * used, and optimizations that reduce the number of
+			 * renames cause this to falsely trigger.  For now,
+			 * just disable it, causing t6423 testcase 2a to break.
+			 * We'll later fix the detection, and when we do we
+			 * will re-enable setting *clean to 0 (and thereby fix
+			 * t6423 testcase 2a).
+			 */
+			/*   *clean = 0;   */
 		} else {
 			strmap_put(&renames->dir_renames[side],
 				   source_dir, (void*)best);
-- 
2.30.0.135.g7f7d4a3e17


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls
  2021-01-24  6:01     ` [PATCH v4 0/3] And so it begins...merge/rename performance work Elijah Newren
  2021-01-24  6:01       ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
  2021-01-24  6:01       ` [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now Elijah Newren
@ 2021-01-24  6:01       ` Elijah Newren
  2 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2021-01-24  6:01 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster, Elijah Newren, Taylor Blau

Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.

=== Setup ===

The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version.  I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic.  The setup is
as follows:

  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots  # Introduce over 26,000 renames
  $ git commit -m "Rename drivers/ to pilots/"
  $ git config merge.renameLimit 30000
  $ git config merge.directoryRenames true

=== Testcases ===

Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:

Testcase #1: no-renames

  $ git checkout v5.4^0
  $ REBASE --onto HEAD base hwmon-updates

  Note: technically the name is misleading; there are some renames, but
  very few.  Rename detection only takes about half the overall time.

Testcase #2: mega-renames

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-updates

Testcase #3: just-one-mega

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-just-one

=== Timing results ===

Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):

                       merge-recursive           merge-ort
    no-renames:       18.912 s ±  0.174 s    14.263 s ±  0.053 s
    mega-renames:   5964.031 s ± 10.459 s  5504.231 s ±  5.150 s
    just-one-mega:   149.583 s ±  0.751 s   158.534 s ±  0.498 s

A single re-run of each with some breakdowns:

                                    ---  no-renames  ---
                              merge-recursive   merge-ort
    overall runtime:              19.302 s        14.257 s
    inexact rename detection:      7.603 s         7.906 s
    everything else:              11.699 s         6.351 s

                                    --- mega-renames ---
                              merge-recursive   merge-ort
    overall runtime:            5950.195 s      5499.672 s
    inexact rename detection:   5746.309 s      5487.120 s
    everything else:             203.886 s        17.552 s

                                    --- just-one-mega ---
                              merge-recursive   merge-ort
    overall runtime:             151.001 s       158.582 s
    inexact rename detection:    143.448 s       157.835 s
    everything else:               7.553 s         0.747 s

=== Timing observations ===

0) Maximum speedup

The "everything else" row represents the maximum speedup we could
achieve if we were to somehow infinitely parallelize inexact rename
detection, but leave everything else alone.  The fact that this is so
much smaller than the real runtime (even in the case with virtually no
renames) makes it clear just how overwhelmingly large the time spent on
rename detection can be.

1) no-renames

1a) merge-ort is faster than merge-recursive, which is nice.  However,
this still should not be considered good enough.  Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not.  In fact, even
merge-ort is slower.  The "apply" backend can complete this testcase in
    6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive.  One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.

2) mega-renames

2a) Obviously rename detection is a huge cost; it's where most the time
is spent.  We need to cut that down.  If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 17s.  I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort and into fast-rebase already.

3) just-one-mega

3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.

=== Goals ===

This patch is obviously just the beginning.  Here are some of my goals
that this measurement will help us achieve:

* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
  factors (which would have previously been overshadowed by rename
  detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
  is noticeably faster with merge-ort than with apply-based rebase.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <ttaylorr@github.com>
---
 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 90db9ebd6d..8fe6c9384b 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -465,6 +465,7 @@ void diffcore_rename(struct diff_options *options)
 	int num_destinations, dst_cnt;
 	struct progress *progress = NULL;
 
+	trace2_region_enter("diff", "setup", options->repo);
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
 
@@ -510,14 +511,17 @@ void diffcore_rename(struct diff_options *options)
 			register_rename_src(p);
 		}
 	}
+	trace2_region_leave("diff", "setup", options->repo);
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
 		goto cleanup; /* nothing to do */
 
+	trace2_region_enter("diff", "exact renames", options->repo);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
 	rename_count = find_exact_renames(options);
+	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -545,6 +549,7 @@ void diffcore_rename(struct diff_options *options)
 		break;
 	}
 
+	trace2_region_enter("diff", "inexact renames", options->repo);
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
@@ -600,11 +605,13 @@ void diffcore_rename(struct diff_options *options)
 	if (detect_rename == DIFF_DETECT_COPY)
 		rename_count += find_renames(mx, dst_cnt, minimum_score, 1);
 	free(mx);
+	trace2_region_leave("diff", "inexact renames", options->repo);
 
  cleanup:
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+	trace2_region_enter("diff", "write back to queue", options->repo);
 	DIFF_QUEUE_CLEAR(&outq);
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -680,5 +687,6 @@ void diffcore_rename(struct diff_options *options)
 		strintmap_clear(break_idx);
 		FREE_AND_NULL(break_idx);
 	}
+	trace2_region_leave("diff", "write back to queue", options->repo);
 	return;
 }
diff --git a/merge-ort.c b/merge-ort.c
index f04fab96d7..931b91438c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -752,7 +752,9 @@ static int collect_merge_info(struct merge_options *opt,
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
 
+	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
+	trace2_region_leave("merge", "traverse_trees", opt->repo);
 
 	return ret;
 }
@@ -2105,9 +2107,12 @@ static void detect_regular_renames(struct merge_options *opt,
 	diff_opts.show_rename_progress = opt->show_rename_progress;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	diff_setup_done(&diff_opts);
+
+	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
 		      &diff_opts);
 	diffcore_std(&diff_opts);
+	trace2_region_leave("diff", "diffcore_rename", opt->repo);
 
 	if (diff_opts.needed_rename_limit > renames->needed_limit)
 		renames->needed_limit = diff_opts.needed_rename_limit;
@@ -2206,9 +2211,12 @@ static int detect_and_process_renames(struct merge_options *opt,
 
 	memset(&combined, 0, sizeof(combined));
 
+	trace2_region_enter("merge", "regular renames", opt->repo);
 	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
 	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
+	trace2_region_leave("merge", "regular renames", opt->repo);
 
+	trace2_region_enter("merge", "directory renames", opt->repo);
 	need_dir_renames =
 	  !opt->priv->call_depth &&
 	  (opt->detect_directory_renames == MERGE_DIRECTORY_RENAMES_TRUE ||
@@ -2230,8 +2238,11 @@ static int detect_and_process_renames(struct merge_options *opt,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
 	QSORT(combined.queue, combined.nr, compare_pairs);
+	trace2_region_leave("merge", "directory renames", opt->repo);
 
+	trace2_region_enter("merge", "process renames", opt->repo);
 	clean &= process_renames(opt, &combined);
+	trace2_region_leave("merge", "process renames", opt->repo);
 
 	/* Free memory for renames->pairs[] and combined */
 	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
@@ -2913,20 +2924,30 @@ static void process_entries(struct merge_options *opt,
 						   STRING_LIST_INIT_NODUP,
 						   NULL, 0 };
 
+	trace2_region_enter("merge", "process_entries setup", opt->repo);
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
 	/* Hack to pre-allocate plist to the desired size */
+	trace2_region_enter("merge", "plist grow", opt->repo);
 	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+	trace2_region_leave("merge", "plist grow", opt->repo);
 
 	/* Put every entry from paths into plist, then sort */
+	trace2_region_enter("merge", "plist copy", opt->repo);
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
 		string_list_append(&plist, e->key)->util = e->value;
 	}
+	trace2_region_leave("merge", "plist copy", opt->repo);
+
+	trace2_region_enter("merge", "plist special sort", opt->repo);
 	plist.cmp = string_list_df_name_compare;
 	string_list_sort(&plist);
+	trace2_region_leave("merge", "plist special sort", opt->repo);
+
+	trace2_region_leave("merge", "process_entries setup", opt->repo);
 
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
@@ -2937,6 +2958,7 @@ static void process_entries(struct merge_options *opt,
 	 * (because it allows us to know whether the directory is still in
 	 * the way when it is time to process the file at the same path).
 	 */
+	trace2_region_enter("merge", "processing", opt->repo);
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
 		/*
@@ -2955,7 +2977,9 @@ static void process_entries(struct merge_options *opt,
 			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
+	trace2_region_leave("merge", "processing", opt->repo);
 
+	trace2_region_enter("merge", "process_entries cleanup", opt->repo);
 	if (dir_metadata.offsets.nr != 1 ||
 	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
 		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
@@ -2970,6 +2994,7 @@ static void process_entries(struct merge_options *opt,
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
 	string_list_clear(&dir_metadata.offsets, 0);
+	trace2_region_leave("merge", "process_entries cleanup", opt->repo);
 }
 
 /*** Function Grouping: functions related to merge_switch_to_result() ***/
@@ -3128,12 +3153,15 @@ void merge_switch_to_result(struct merge_options *opt,
 	if (result->clean >= 0 && update_worktree_and_index) {
 		struct merge_options_internal *opti = result->priv;
 
+		trace2_region_enter("merge", "checkout", opt->repo);
 		if (checkout(opt, head, result->tree)) {
 			/* failure to function */
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "checkout", opt->repo);
 
+		trace2_region_enter("merge", "record_conflicted", opt->repo);
 		if (record_conflicted_index_entries(opt, opt->repo->index,
 						    &opti->paths,
 						    &opti->conflicted)) {
@@ -3141,6 +3169,7 @@ void merge_switch_to_result(struct merge_options *opt,
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "record_conflicted", opt->repo);
 	}
 
 	if (display_update_msgs) {
@@ -3150,6 +3179,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		struct string_list olist = STRING_LIST_INIT_NODUP;
 		int i;
 
+		trace2_region_enter("merge", "display messages", opt->repo);
+
 		/* Hack to pre-allocate olist to the desired size */
 		ALLOC_GROW(olist.items, strmap_get_size(&opti->output),
 			   olist.alloc);
@@ -3171,6 +3202,8 @@ void merge_switch_to_result(struct merge_options *opt,
 		/* Also include needed rename limit adjustment now */
 		diff_warn_rename_limit("merge.renamelimit",
 				       opti->renames.needed_limit, 0);
+
+		trace2_region_leave("merge", "display messages", opt->repo);
 	}
 
 	merge_finalize(opt, result);
@@ -3212,6 +3245,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	int i;
 
 	/* Sanity checks on opt */
+	trace2_region_enter("merge", "sanity checks", opt->repo);
 	assert(opt->repo);
 
 	assert(opt->branch1 && opt->branch2);
@@ -3250,11 +3284,13 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 		assert(!opt->priv->toplevel_dir ||
 		       0 == strlen(opt->priv->toplevel_dir));
 	}
+	trace2_region_leave("merge", "sanity checks", opt->repo);
 
 	/* Default to histogram diff.  Actually, just hardcode it...for now. */
 	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
 
 	/* Initialization of opt->priv, our internal merge data */
+	trace2_region_enter("merge", "allocate/init", opt->repo);
 	if (opt->priv) {
 		clear_or_reinit_internal_opts(opt->priv, 1);
 		trace2_region_leave("merge", "allocate/init", opt->repo);
@@ -3292,6 +3328,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * subset of the overall paths that have special output.
 	 */
 	strmap_init(&opt->priv->output);
+
+	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
 /*** Function Grouping: merge_incore_*() and their internal variants ***/
@@ -3307,6 +3345,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
+	trace2_region_enter("merge", "collect_merge_info", opt->repo);
 	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
 		/*
 		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
@@ -3319,10 +3358,16 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 		result->clean = -1;
 		return;
 	}
+	trace2_region_leave("merge", "collect_merge_info", opt->repo);
 
+	trace2_region_enter("merge", "renames", opt->repo);
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
+	trace2_region_leave("merge", "renames", opt->repo);
+
+	trace2_region_enter("merge", "process_entries", opt->repo);
 	process_entries(opt, &working_tree_oid);
+	trace2_region_leave("merge", "process_entries", opt->repo);
 
 	/* Set return values */
 	result->tree = parse_tree_indirect(&working_tree_oid);
@@ -3423,9 +3468,15 @@ void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_nonrecursive", opt->repo);
+
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	assert(opt->ancestor != NULL);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
+	trace2_region_leave("merge", "incore_nonrecursive", opt->repo);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
@@ -3434,9 +3485,15 @@ void merge_incore_recursive(struct merge_options *opt,
 			    struct commit *side2,
 			    struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_recursive", opt->repo);
+
 	/* We set the ancestor label based on the merge_bases */
 	assert(opt->ancestor == NULL);
 
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_internal(opt, merge_bases, side1, side2, result);
+	trace2_region_leave("merge", "incore_recursive", opt->repo);
 }
-- 
2.30.0.135.g7f7d4a3e17


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/3] merge-ort: fix massive leak
  2021-01-24  6:01       ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
@ 2021-01-24 19:11         ` Derrick Stolee
  0 siblings, 0 replies; 17+ messages in thread
From: Derrick Stolee @ 2021-01-24 19:11 UTC (permalink / raw)
  To: Elijah Newren, git
  Cc: Derrick Stolee, Jeff King, Jonathan Nieder, Jonathan Tan,
	Taylor Blau, gitster

On 1/24/2021 1:01 AM, Elijah Newren wrote:
> When a series of merges was performed (such as for a rebase or series of
> cherry-picks), only the data structures allocated by the final merge
> operation were being freed.  The problem was that while picking out
> pieces of merge-ort to upstream, I previously misread a certain section
> of merge_start() and assumed it was associated with a later
> optimization.  Include that section now, which ensures that if there was
> a previous merge operation, that we clear out result->priv and then
> re-use it for opt->priv, and otherwise we allocate opt->priv.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index 05c6b2e0dc..b5845ff6e9 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3227,11 +3227,28 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
>  	assert(opt->obuf.len == 0);
>  
>  	assert(opt->priv == NULL);
> +	if (result->priv) {
> +		opt->priv = result->priv;
> +		result->priv = NULL;
> +		/*
> +		 * opt->priv non-NULL means we had results from a previous
> +		 * run; do a few sanity checks that user didn't mess with
> +		 * it in an obvious fashion.
> +		 */
> +		assert(opt->priv->call_depth == 0);
> +		assert(!opt->priv->toplevel_dir ||
> +		       0 == strlen(opt->priv->toplevel_dir));
> +	}

So instead of simply leaking result->priv, we re-use the
data for the next round.

>  
>  	/* Default to histogram diff.  Actually, just hardcode it...for now. */
>  	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
>  
>  	/* Initialization of opt->priv, our internal merge data */
> +	if (opt->priv) {
> +		clear_or_reinit_internal_opts(opt->priv, 1);
> +		trace2_region_leave("merge", "allocate/init", opt->repo);
> +		return;
> +	}
>  	opt->priv = xcalloc(1, sizeof(*opt->priv));

and here you reset the data instead of reallocating it. OK.

-Stolee


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-24 19:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-08 20:51 [PATCH 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-08 20:59   ` Taylor Blau
2021-01-08 21:50     ` Elijah Newren
2021-01-08 21:55       ` Taylor Blau
2021-01-09  0:52         ` Elijah Newren
2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-13 22:11   ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-14 19:08   ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
2021-01-14 20:18     ` Elijah Newren
2021-01-15 19:29   ` [PATCH v3 " Elijah Newren
2021-01-15 19:29     ` [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-24  6:01     ` [PATCH v4 0/3] And so it begins...merge/rename performance work Elijah Newren
2021-01-24  6:01       ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
2021-01-24 19:11         ` Derrick Stolee
2021-01-24  6:01       ` [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now Elijah Newren
2021-01-24  6:01       ` [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).