git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] Final optimization batch (#15): use memory pools
@ 2021-07-23 12:54 Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
                   ` (8 more replies)
  0 siblings, 9 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren

This series textually depends on en/ort-perf-batch-14, but the ideas are
orthogonal to it and orthogonal to previous series. It can be reviewed
independently.

This series is more about strmaps & memory pools than merge logic. CC'ing
Peff since he reviewed the strmap work[1], and that work included a number
of decisions that specifically had this series in mind.

[1]
https://lore.kernel.org/git/20201111200701.GB39046@coredump.intra.peff.net/

=== Basic Optimization idea ===

In this series, I make use of memory pools to get faster allocations and
deallocations for many data structures that tend to all be deallocated at
the same time anyway.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


=== Overall Results across all optimization work ===

This is my final prepared optimization series. It might be worth reviewing
how my optimizations fared overall, comparing the original merge-recursive
timings with three things: how much merge-recursive improved (as a
side-effect of optimizing merge-ort), how much improvement we would have
gotten from a hypothetical infinite parallelization of rename detection, and
what I achieved at the end with merge-ort:

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
no-renames:       18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
no-renames:         1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565


And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3


=== Caveat ===

It may be worth noting, though, that my optimization numbers above for
merge-ort use test-tool fast-rebase. git rebase -s ort on the three
testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
respectively). At this point, any further optimization work should go into
making a faster full-featured rebase by copying the ideas from fast-rebase:
avoid unnecessary process forking, avoid updating the index and working copy
until either the rebase is finished or you hit a conflict (and don't write
rebase metadata to disk until that point either), get rid of the glacially
slow revision walking of the upstream side of history (nuke
can_fast_forward(), make --reapply-cherry-picks the default) or at least
don't revision walk so many times (multiple calls to get_merge_bases in
can_fast_forward() plus a is_linear_history() walk, checking for upstream
cherry-picks, probably more), turn off per-commit hooks that probably should
have never been on anyway, etc.

Elijah Newren (7):
  diffcore-rename: use a mem_pool for exact rename detection's hashmap
  merge-ort: set up a memory pool
  merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  merge-ort: switch our strmaps over to using memory pools
  diffcore-rename, merge-ort: add wrapper functions for filepair
    alloc/dealloc
  merge-ort: store filepairs and filespecs in our mem_pool
  merge-ort: reuse path strings in pool_alloc_filespec

 diffcore-rename.c |  66 +++++++++++--
 diffcore.h        |   3 +
 merge-ort.c       | 247 +++++++++++++++++++++++++++++++++++-----------
 3 files changed, 251 insertions(+), 65 deletions(-)


base-commit: c9ada8369e6575be488028aae0f654422a9b1410
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-990%2Fnewren%2Fort-perf-batch-15-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-990/newren/ort-perf-batch-15-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/990
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
@ 2021-07-23 12:54 ` Elijah Newren via GitGitGadget
  2021-07-23 21:59   ` Eric Sunshine
  2021-07-23 12:54 ` [PATCH 2/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Exact rename detection, via insert_file_table(), uses a hashmap to store
files by oid.  Use a mem_pool for the hashmap entries so these can all be
allocated and deallocated together.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      204.2  ms ±  3.0  ms   202.5  ms ±  3.2  ms
    mega-renames:      1.076 s ±  0.015 s     1.072 s ±  0.012 s
    just-one-mega:   364.1  ms ±  7.0  ms   357.3  ms ±  3.9  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 4ef0459cfb5..23b917eca42 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -317,10 +317,11 @@ static int find_identical_files(struct hashmap *srcs,
 }
 
 static void insert_file_table(struct repository *r,
+			      struct mem_pool *pool,
 			      struct hashmap *table, int index,
 			      struct diff_filespec *filespec)
 {
-	struct file_similarity *entry = xmalloc(sizeof(*entry));
+	struct file_similarity *entry = mem_pool_alloc(pool, sizeof(*entry));
 
 	entry->index = index;
 	entry->filespec = filespec;
@@ -336,7 +337,8 @@ static void insert_file_table(struct repository *r,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(struct diff_options *options)
+static int find_exact_renames(struct diff_options *options,
+			      struct mem_pool *pool)
 {
 	int i, renames = 0;
 	struct hashmap file_table;
@@ -346,7 +348,7 @@ static int find_exact_renames(struct diff_options *options)
 	 */
 	hashmap_init(&file_table, NULL, NULL, rename_src_nr);
 	for (i = rename_src_nr-1; i >= 0; i--)
-		insert_file_table(options->repo,
+		insert_file_table(options->repo, pool,
 				  &file_table, i,
 				  rename_src[i].p->one);
 
@@ -355,7 +357,7 @@ static int find_exact_renames(struct diff_options *options)
 		renames += find_identical_files(&file_table, i, options);
 
 	/* Free the hash data structure and entries */
-	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
+	hashmap_clear(&file_table);
 
 	return renames;
 }
@@ -1341,6 +1343,7 @@ void diffcore_rename_extended(struct diff_options *options,
 	int num_destinations, dst_cnt;
 	int num_sources, want_copies;
 	struct progress *progress = NULL;
+	struct mem_pool local_pool;
 	struct dir_rename_info info;
 	struct diff_populate_filespec_options dpf_options = {
 		.check_binary = 0,
@@ -1409,11 +1412,18 @@ void diffcore_rename_extended(struct diff_options *options,
 		goto cleanup; /* nothing to do */
 
 	trace2_region_enter("diff", "exact renames", options->repo);
+	mem_pool_init(&local_pool, 32*1024);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames(options);
+	rename_count = find_exact_renames(options, &local_pool);
+	/*
+	 * Discard local_pool immediately instead of at "cleanup:" in order
+	 * to reduce maximum memory usage; inexact rename detection uses up
+	 * a fair amount of memory, and mem_pools can too.
+	 */
+	mem_pool_discard(&local_pool, 0);
 	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 2/7] merge-ort: set up a memory pool
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
@ 2021-07-23 12:54 ` Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort has a lot of data structures, and they all tend to be freed
together in clear_or_reinit_internal_opts().  Set up a memory pool to
allow us to make these allocations and deallocations faster.  Future
commits will adjust various callers to make use of this memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index e361443087a..cb33c76760f 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -37,6 +37,8 @@
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
 
+#define USE_MEMORY_POOL 1 /* faster, but obscures memory leak hunting */
+
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
  * indices refer to one of the sides of the three-way merge.  This is so
@@ -339,6 +341,17 @@ struct merge_options_internal {
 	 */
 	struct strmap conflicted;
 
+	/*
+	 * pool: memory pool for fast allocation/deallocation
+	 *
+	 * We allocate room for lots of filenames and auxiliary data
+	 * structures in merge_options_internal, and it tends to all be
+	 * freed together too.  Using a memory pool for these provides a
+	 * nice speedup.
+	 */
+	struct mem_pool internal_pool;
+	struct mem_pool *pool; /* NULL, or pointer to internal_pool */
+
 	/*
 	 * paths_to_free: additional list of strings to free
 	 *
@@ -603,6 +616,12 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
+#if USE_MEMORY_POOL
+	mem_pool_discard(&opti->internal_pool, 0);
+	if (!reinitialize)
+		opti->pool = NULL;
+#endif
+
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
 	renames->callback_data_nr = renames->callback_data_alloc = 0;
@@ -4344,6 +4363,12 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of various renames fields */
 	renames = &opt->priv->renames;
+#if USE_MEMORY_POOL
+	mem_pool_init(&opt->priv->internal_pool, 0);
+	opt->priv->pool = &opt->priv->internal_pool;
+#else
+	opt->priv->pool = NULL;
+#endif
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
 					    NOT_RELEVANT, NULL, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 2/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
@ 2021-07-23 12:54 ` Elijah Newren via GitGitGadget
  2021-07-23 22:07   ` Eric Sunshine
  2021-07-26 14:36   ` Derrick Stolee
  2021-07-23 12:54 ` [PATCH 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

We need functions which will either call
    xmalloc, xcalloc, xstrndup
or
    mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
depending on whether we have a non-NULL memory pool.  Add these
functions; the next commit will make use of these.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index cb33c76760f..2bca4b71f2a 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -683,6 +683,30 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
+{
+	if (!pool)
+		return xcalloc(count, size);
+	return mem_pool_calloc(pool, count, size);
+}
+
+MAYBE_UNUSED
+static void *pool_alloc(struct mem_pool *pool, size_t size)
+{
+	if (!pool)
+		return xmalloc(size);
+	return mem_pool_alloc(pool, size);
+}
+
+MAYBE_UNUSED
+static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
+{
+	if (!pool)
+		return xstrndup(str, len);
+	return mem_pool_strndup(pool, str, len);
+}
+
 /* add a string to a strbuf, but converting "/" to "_" */
 static void add_flattened_path(struct strbuf *out, const char *s)
 {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-07-23 12:54 ` [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
@ 2021-07-23 12:54 ` Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For all the strmaps (including strintmaps and strsets) whose memory is
unconditionally freed as part of clear_or_reinit_internal_opts(), switch
them over to using our new memory pool.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      202.5  ms ±  3.2  ms    198.1 ms ±  2.6 ms
    mega-renames:      1.072 s ±  0.012 s    715.8 ms ±  4.0 ms
    just-one-mega:   357.3  ms ±  3.9  ms    276.8 ms ±  4.2 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 125 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 75 insertions(+), 50 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 2bca4b71f2a..5fd2a4ccd35 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	void (*strset_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
-	/*
-	 * We marked opti->paths with strdup_strings = 0, so that we
-	 * wouldn't have to make another copy of the fullpath created by
-	 * make_traverse_path from setup_path_info().  But, now that we've
-	 * used it and have no other references to these strings, it is time
-	 * to deallocate them.
-	 */
-	free_strmap_strings(&opti->paths);
-	strmap_func(&opti->paths, 1);
+	if (opti->pool)
+		strmap_func(&opti->paths, 0);
+	else {
+		/*
+		 * We marked opti->paths with strdup_strings = 0, so that
+		 * we wouldn't have to make another copy of the fullpath
+		 * created by make_traverse_path from setup_path_info().
+		 * But, now that we've used it and have no other references
+		 * to these strings, it is time to deallocate them.
+		 */
+		free_strmap_strings(&opti->paths);
+		strmap_func(&opti->paths, 1);
+	}
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
@@ -556,16 +560,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 */
 	strmap_func(&opti->conflicted, 0);
 
-	/*
-	 * opti->paths_to_free is similar to opti->paths; we created it with
-	 * strdup_strings = 0 to avoid making _another_ copy of the fullpath
-	 * but now that we've used it and have no other references to these
-	 * strings, it is time to deallocate them.  We do so by temporarily
-	 * setting strdup_strings to 1.
-	 */
-	opti->paths_to_free.strdup_strings = 1;
-	string_list_clear(&opti->paths_to_free, 0);
-	opti->paths_to_free.strdup_strings = 0;
+	if (!opti->pool) {
+		/*
+		 * opti->paths_to_free is similar to opti->paths; we
+		 * created it with strdup_strings = 0 to avoid making
+		 * _another_ copy of the fullpath but now that we've used
+		 * it and have no other references to these strings, it is
+		 * time to deallocate them.  We do so by temporarily
+		 * setting strdup_strings to 1.
+		 */
+		opti->paths_to_free.strdup_strings = 1;
+		string_list_clear(&opti->paths_to_free, 0);
+		opti->paths_to_free.strdup_strings = 0;
+	}
 
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
@@ -683,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
@@ -691,7 +697,6 @@ static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 	return mem_pool_calloc(pool, count, size);
 }
 
-MAYBE_UNUSED
 static void *pool_alloc(struct mem_pool *pool, size_t size)
 {
 	if (!pool)
@@ -699,7 +704,6 @@ static void *pool_alloc(struct mem_pool *pool, size_t size)
 	return mem_pool_alloc(pool, size);
 }
 
-MAYBE_UNUSED
 static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
 {
 	if (!pool)
@@ -835,8 +839,9 @@ static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
-				   sizeof(struct conflict_info));
+	mi = pool_calloc(opt->priv->pool, 1,
+			 resolved ? sizeof(struct merged_info) :
+				    sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
 	mi->basename_offset = current_dir_name_len;
 	mi->clean = !!resolved;
@@ -1128,7 +1133,7 @@ static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = xmalloc(len + 1);
+	fullpath = pool_alloc(opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -1383,7 +1388,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 		copy = renames->deferred[side].possible_trivial_merges;
 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
 					    0,
-					    NULL,
+					    opt->priv->pool,
 					    0);
 		strintmap_for_each_entry(&copy, &iter, entry) {
 			const char *path = entry->key;
@@ -2335,12 +2340,21 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 	VERIFY_CI(ci);
 
 	/* Find parent directories missing from opt->priv->paths */
-	cur_path = new_path;
+	if (opt->priv->pool) {
+		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
+		free((char*)new_path);
+		new_path = (char *)cur_path;
+	} else {
+		cur_path = new_path;
+	}
+
 	while (1) {
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = xstrndup(cur_path, last_slash - cur_path);
+			parent_name = pool_strndup(opt->priv->pool,
+						   cur_path,
+						   last_slash - cur_path);
 		} else {
 			parent_name = opt->priv->toplevel_dir;
 			break;
@@ -2349,7 +2363,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Look it up in opt->priv->paths */
 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
 		if (entry) {
-			free((char*)parent_name);
+			if (!opt->priv->pool)
+				free((char*)parent_name);
 			parent_name = entry->key; /* reuse known pointer */
 			break;
 		}
@@ -2376,12 +2391,15 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		parent_name = cur_dir;
 	}
 
-	/*
-	 * We are removing old_path from opt->priv->paths.  old_path also will
-	 * eventually need to be freed, but it may still be used by e.g.
-	 * ci->pathnames.  So, store it in another string-list for now.
-	 */
-	string_list_append(&opt->priv->paths_to_free, old_path);
+	if (!opt->priv->pool) {
+		/*
+		 * We are removing old_path from opt->priv->paths.
+		 * old_path also will eventually need to be freed, but it
+		 * may still be used by e.g.  ci->pathnames.  So, store it
+		 * in another string-list for now.
+		 */
+		string_list_append(&opt->priv->paths_to_free, old_path);
+	}
 
 	assert(ci->filemask == 2 || ci->filemask == 4);
 	assert(ci->dirmask == 0);
@@ -2416,7 +2434,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		new_ci->stages[index].mode = ci->stages[index].mode;
 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
 
-		free(ci);
+		if (!opt->priv->pool)
+			free(ci);
 		ci = new_ci;
 	}
 
@@ -3623,7 +3642,8 @@ static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		CALLOC_ARRAY(new_ci, 1);
+		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
 		 * want any directory mode/oid values copied, but we'll zero
@@ -3713,7 +3733,7 @@ static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = xmalloc(sizeof(*new_ci));
+			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
 				rename_a = 1;
@@ -3777,12 +3797,14 @@ static void process_entry(struct merge_options *opt,
 				strmap_remove(&opt->priv->paths, path, 0);
 				/*
 				 * We removed path from opt->priv->paths.  path
-				 * will also eventually need to be freed, but
-				 * it may still be used by e.g.  ci->pathnames.
-				 * So, store it in another string-list for now.
+				 * will also eventually need to be freed if not
+				 * part of a memory pool...but it may still be
+				 * used by e.g. ci->pathnames.  So, store it in
+				 * another string-list for now in that case.
 				 */
-				string_list_append(&opt->priv->paths_to_free,
-						   path);
+				if (!opt->priv->pool)
+					string_list_append(&opt->priv->paths_to_free,
+							   path);
 			}
 
 			/*
@@ -4322,6 +4344,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
 	struct rename_info *renames;
 	int i;
+	struct mem_pool *pool = NULL;
 
 	/* Sanity checks on opt */
 	trace2_region_enter("merge", "sanity checks", opt->repo);
@@ -4393,9 +4416,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 #else
 	opt->priv->pool = NULL;
 #endif
+	pool = opt->priv->pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
-					    NOT_RELEVANT, NULL, 0);
+					    NOT_RELEVANT, pool, 0);
 		strmap_init_with_options(&renames->dir_rename_count[i],
 					 NULL, 1);
 		strmap_init_with_options(&renames->dir_renames[i],
@@ -4409,7 +4433,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 		 */
 		strintmap_init_with_options(&renames->relevant_sources[i],
 					    -1 /* explicitly invalid */,
-					    NULL, 0);
+					    pool, 0);
 		strmap_init_with_options(&renames->cached_pairs[i],
 					 NULL, 1);
 		strset_init_with_options(&renames->cached_irrelevant[i],
@@ -4419,9 +4443,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->deferred[i].possible_trivial_merges,
-					    0, NULL, 0);
+					    0, pool, 0);
 		strset_init_with_options(&renames->deferred[i].target_dirs,
-					 NULL, 1);
+					 pool, 1);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 
@@ -4434,9 +4458,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * In contrast, conflicted just has a subset of keys from paths, so
 	 * we don't want to free those (it'd be a duplicate free).
 	 */
-	strmap_init_with_options(&opt->priv->paths, NULL, 0);
-	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
-	string_list_init(&opt->priv->paths_to_free, 0);
+	strmap_init_with_options(&opt->priv->paths, pool, 0);
+	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
+	if (!opt->priv->pool)
+		string_list_init(&opt->priv->paths_to_free, 0);
 
 	/*
 	 * keys & strbufs in output will sometimes need to outlive "paths",
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-07-23 12:54 ` [PATCH 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
@ 2021-07-23 12:54 ` Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

We want to be able to allocate filespecs and filepairs using a mem_pool.
However, filespec data will still remain outside the pool (perhaps in
the future we could plumb the pool through the various diff APIs to
allocate the filespec data too, but for now we are limiting the scope).
Add some extra functions to allocate these appropriately based on the
non-NULL-ness of opt->priv->pool, as well as some extra functions to
handle correctly deallocating the relevant parts of them.  A future
commit will make use of these new functions.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 41 +++++++++++++++++++++++++++++++++++++++++
 diffcore.h        |  2 ++
 merge-ort.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 23b917eca42..09606501cea 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1328,6 +1328,47 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
 	rename_src_nr = new_num_src;
 }
 
+static void free_filespec_data(struct diff_filespec *spec)
+{
+	if (!--spec->count)
+		diff_free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+static void pool_free_filespec(struct mem_pool *pool,
+			       struct diff_filespec *spec)
+{
+	if (!pool) {
+		free_filespec(spec);
+		return;
+	}
+
+	/*
+	 * Similar to free_filespec(), but only frees the data.  The spec
+	 * itself was allocated in the pool and should not be individually
+	 * freed.
+	 */
+	free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p)
+{
+	if (!pool) {
+		diff_free_filepair(p);
+		return;
+	}
+
+	/*
+	 * Similar to diff_free_filepair() but only frees the data from the
+	 * filespecs; not the filespecs or the filepair which were
+	 * allocated from the pool.
+	 */
+	free_filespec_data(p->one);
+	free_filespec_data(p->two);
+}
+
 void diffcore_rename_extended(struct diff_options *options,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
diff --git a/diffcore.h b/diffcore.h
index 533b30e21e7..b58ee6b1934 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -127,6 +127,8 @@ struct diff_filepair {
 #define DIFF_PAIR_MODE_CHANGED(p) ((p)->one->mode != (p)->two->mode)
 
 void diff_free_filepair(struct diff_filepair *);
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p);
 
 int diff_unmodified_pair(struct diff_filepair *);
 
diff --git a/merge-ort.c b/merge-ort.c
index 5fd2a4ccd35..59428e45884 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,6 +690,48 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
+						 const char *path)
+{
+	struct diff_filespec *spec;
+	size_t len;
+
+	if (!pool)
+		return alloc_filespec(path);
+
+	/* Same code as alloc_filespec, except allocate from pool */
+	len = strlen(path);
+
+	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
+	memcpy(spec+1, path, len);
+	spec->path = (void*)(spec+1);
+
+	spec->count = 1;
+	spec->is_binary = -1;
+	return spec;
+}
+
+MAYBE_UNUSED
+static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
+					     struct diff_queue_struct *queue,
+					     struct diff_filespec *one,
+					     struct diff_filespec *two)
+{
+	struct diff_filepair *dp;
+
+	if (!pool)
+		return diff_queue(queue, one, two);
+
+	/* Same code as diff_queue, except allocate from pool */
+	dp = mem_pool_calloc(pool, 1, sizeof(*dp));
+	dp->one = one;
+	dp->two = two;
+	if (queue)
+		diff_q(queue, dp);
+	return dp;
+}
+
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 6/7] merge-ort: store filepairs and filespecs in our mem_pool
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-07-23 12:54 ` [PATCH 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
@ 2021-07-23 12:54 ` Elijah Newren via GitGitGadget
  2021-07-23 12:54 ` [PATCH 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.1 ms ±  2.6 ms     198.5 ms ±  3.4 ms
    mega-renames:     715.8 ms ±  4.0 ms     679.1 ms ±  5.6 ms
    just-one-mega:    276.8 ms ±  4.2 ms     271.9 ms ±  2.8 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  9 ++++-----
 diffcore.h        |  1 +
 merge-ort.c       | 26 ++++++++++++++------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 09606501cea..e30e4288d1b 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1334,7 +1334,6 @@ static void free_filespec_data(struct diff_filespec *spec)
 		diff_free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 static void pool_free_filespec(struct mem_pool *pool,
 			       struct diff_filespec *spec)
 {
@@ -1351,7 +1350,6 @@ static void pool_free_filespec(struct mem_pool *pool,
 	free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 void pool_diff_free_filepair(struct mem_pool *pool,
 			     struct diff_filepair *p)
 {
@@ -1370,6 +1368,7 @@ void pool_diff_free_filepair(struct mem_pool *pool,
 }
 
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
@@ -1683,7 +1682,7 @@ void diffcore_rename_extended(struct diff_options *options,
 			pair_to_free = p;
 
 		if (pair_to_free)
-			diff_free_filepair(pair_to_free);
+			pool_diff_free_filepair(pool, pair_to_free);
 	}
 	diff_debug_queue("done copying original", &outq);
 
@@ -1693,7 +1692,7 @@ void diffcore_rename_extended(struct diff_options *options,
 
 	for (i = 0; i < rename_dst_nr; i++)
 		if (rename_dst[i].filespec_to_free)
-			free_filespec(rename_dst[i].filespec_to_free);
+			pool_free_filespec(pool, rename_dst[i].filespec_to_free);
 
 	cleanup_dir_rename_info(&info, dirs_removed, dir_rename_count != NULL);
 	FREE_AND_NULL(rename_dst);
@@ -1710,5 +1709,5 @@ void diffcore_rename_extended(struct diff_options *options,
 
 void diffcore_rename(struct diff_options *options)
 {
-	diffcore_rename_extended(options, NULL, NULL, NULL, NULL);
+	diffcore_rename_extended(options, NULL, NULL, NULL, NULL, NULL);
 }
diff --git a/diffcore.h b/diffcore.h
index b58ee6b1934..badc2261c20 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -181,6 +181,7 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count);
 void diffcore_break(struct repository *, int);
 void diffcore_rename(struct diff_options *);
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
diff --git a/merge-ort.c b/merge-ort.c
index 59428e45884..d29c7fe8a30 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
@@ -712,7 +711,6 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 	return spec;
 }
 
-MAYBE_UNUSED
 static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 					     struct diff_queue_struct *queue,
 					     struct diff_filespec *one,
@@ -930,6 +928,7 @@ static void add_pair(struct merge_options *opt,
 		     unsigned dir_rename_mask)
 {
 	struct diff_filespec *one, *two;
+	struct mem_pool *pool = opt->priv->pool;
 	struct rename_info *renames = &opt->priv->renames;
 	int names_idx = is_add ? side : 0;
 
@@ -980,11 +979,11 @@ static void add_pair(struct merge_options *opt,
 			return;
 	}
 
-	one = alloc_filespec(pathname);
-	two = alloc_filespec(pathname);
+	one = pool_alloc_filespec(pool, pathname);
+	two = pool_alloc_filespec(pool, pathname);
 	fill_filespec(is_add ? two : one,
 		      &names[names_idx].oid, 1, names[names_idx].mode);
-	diff_queue(&renames->pairs[side], one, two);
+	pool_diff_queue(pool, &renames->pairs[side], one, two);
 }
 
 static void collect_rename_info(struct merge_options *opt,
@@ -2893,6 +2892,7 @@ static void use_cached_pairs(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
+	struct mem_pool *pool = opt->priv->pool;
 
 	/*
 	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
@@ -2906,9 +2906,9 @@ static void use_cached_pairs(struct merge_options *opt,
 			new_name = old_name;
 
 		/* We don't care about oid/mode, only filenames and status */
-		one = alloc_filespec(old_name);
-		two = alloc_filespec(new_name);
-		diff_queue(pairs, one, two);
+		one = pool_alloc_filespec(pool, old_name);
+		two = pool_alloc_filespec(pool, new_name);
+		pool_diff_queue(pool, pairs, one, two);
 		pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
 	}
 }
@@ -3016,6 +3016,7 @@ static int detect_regular_renames(struct merge_options *opt,
 	diff_queued_diff = renames->pairs[side_index];
 	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diffcore_rename_extended(&diff_opts,
+				 opt->priv->pool,
 				 &renames->relevant_sources[side_index],
 				 &renames->dirs_removed[side_index],
 				 &renames->dir_rename_count[side_index],
@@ -3066,7 +3067,7 @@ static int collect_renames(struct merge_options *opt,
 
 		if (p->status != 'A' && p->status != 'R') {
 			possibly_cache_new_pair(renames, p, side_index, NULL);
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3079,7 +3080,7 @@ static int collect_renames(struct merge_options *opt,
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
 		if (p->status != 'R' && !new_path) {
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3197,7 +3198,7 @@ cleanup:
 		side_pairs = &renames->pairs[s];
 		for (i = 0; i < side_pairs->nr; ++i) {
 			struct diff_filepair *p = side_pairs->queue[i];
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 		}
 	}
 
@@ -3210,7 +3211,8 @@ simple_cleanup:
 	if (combined.nr) {
 		int i;
 		for (i = 0; i < combined.nr; i++)
-			diff_free_filepair(combined.queue[i]);
+			pool_diff_free_filepair(opt->priv->pool,
+						combined.queue[i]);
 		free(combined.queue);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 7/7] merge-ort: reuse path strings in pool_alloc_filespec
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-07-23 12:54 ` [PATCH 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
@ 2021-07-23 12:54 ` Elijah Newren via GitGitGadget
  2021-07-26 14:44 ` [PATCH 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-23 12:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

pool_alloc_filespec() was written so that the code when pool != NULL
mimicked the code from alloc_filespec(), which including allocating
enough extra space for the path and then copying it.  However, the path
passed to pool_alloc_filespec() is always going to already be in the
same memory pool, so we may as well reuse it instead of copying it.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.5 ms ±  3.4 ms     198.3 ms ±  2.9 ms
    mega-renames:     679.1 ms ±  5.6 ms     661.8 ms ±  5.9 ms
    just-one-mega:    271.9 ms ±  2.8 ms     264.6 ms ±  2.5 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index d29c7fe8a30..0fb942692a7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -694,17 +694,13 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
 	struct diff_filespec *spec;
-	size_t len;
 
 	if (!pool)
 		return alloc_filespec(path);
 
-	/* Same code as alloc_filespec, except allocate from pool */
-	len = strlen(path);
-
-	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
-	memcpy(spec+1, path, len);
-	spec->path = (void*)(spec+1);
+	/* Similar to alloc_filespec, but allocate from pool and reuse path */
+	spec = mem_pool_calloc(pool, 1, sizeof(*spec));
+	spec->path = (char*)path; /* spec won't modify it */
 
 	spec->count = 1;
 	spec->is_binary = -1;
@@ -2904,6 +2900,25 @@ static void use_cached_pairs(struct merge_options *opt,
 		const char *new_name = entry->value;
 		if (!new_name)
 			new_name = old_name;
+		if (pool) {
+			/*
+			 * cached_pairs has _copies* of old_name and new_name,
+			 * because it has to persist across merges.  When
+			 *   pool != NULL
+			 * pool_alloc_filespec() will just re-use the existing
+			 * filenames, which will also get re-used by
+			 * opt->priv->paths if they become renames, and then
+			 * get freed at the end of the merge, leaving the copy
+			 * in cached_pairs dangling.  Avoid this by making a
+			 * copy here.
+			 *
+			 * When pool == NULL, pool_alloc_filespec() calls
+			 * alloc_filespec(), which makes a copy; we don't want
+			 * to add another.
+			 */
+			old_name = mem_pool_strdup(pool, old_name);
+			new_name = mem_pool_strdup(pool, new_name);
+		}
 
 		/* We don't care about oid/mode, only filenames and status */
 		one = pool_alloc_filespec(pool, old_name);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2021-07-23 12:54 ` [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
@ 2021-07-23 21:59   ` Eric Sunshine
  2021-07-23 22:03     ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Sunshine @ 2021-07-23 21:59 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Git List, Jeff King, Elijah Newren

On Fri, Jul 23, 2021 at 8:55 AM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Exact rename detection, via insert_file_table(), uses a hashmap to store
> files by oid.  Use a mem_pool for the hashmap entries so these can all be
> allocated and deallocated together.
> [...]
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> @@ -355,7 +357,7 @@ static int find_exact_renames(struct diff_options *options)
>         /* Free the hash data structure and entries */
> -       hashmap_clear_and_free(&file_table, struct file_similarity, entry);
> +       hashmap_clear(&file_table);

Does the in-code comment become a bit out of date with this change?
(It might make sense to drop the comment altogether -- or, if not,
explain that the hashmap entries get thrown away later with the pool?)

Not necessarily worth a re-roll.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2021-07-23 21:59   ` Eric Sunshine
@ 2021-07-23 22:03     ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-07-23 22:03 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Elijah Newren via GitGitGadget, Git List, Jeff King

On Fri, Jul 23, 2021 at 2:59 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, Jul 23, 2021 at 8:55 AM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > Exact rename detection, via insert_file_table(), uses a hashmap to store
> > files by oid.  Use a mem_pool for the hashmap entries so these can all be
> > allocated and deallocated together.
> > [...]
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/diffcore-rename.c b/diffcore-rename.c
> > @@ -355,7 +357,7 @@ static int find_exact_renames(struct diff_options *options)
> >         /* Free the hash data structure and entries */
> > -       hashmap_clear_and_free(&file_table, struct file_similarity, entry);
> > +       hashmap_clear(&file_table);
>
> Does the in-code comment become a bit out of date with this change?
> (It might make sense to drop the comment altogether -- or, if not,
> explain that the hashmap entries get thrown away later with the pool?)

Ah, good catch.  Yeah, I should update it or drop the comment.

> Not necessarily worth a re-roll.

I'm sure someone will comment on something else, so I'll just include
this among the fixes.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-23 12:54 ` [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
@ 2021-07-23 22:07   ` Eric Sunshine
  2021-07-26 14:36   ` Derrick Stolee
  1 sibling, 0 replies; 65+ messages in thread
From: Eric Sunshine @ 2021-07-23 22:07 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Git List, Jeff King, Elijah Newren

On Fri, Jul 23, 2021 at 8:55 AM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> We need functions which will either call
>     xmalloc, xcalloc, xstrndup
> or
>     mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
> depending on whether we have a non-NULL memory pool.  Add these
> functions; the next commit will make use of these.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>

Patch [2/7] feels somewhat incomplete without the utility functions
introduced by this patch (and, indeed, when reading [2/7], I was
wondering how you were going to deal with the potential NULL pointer).
From a review standpoint, I could easily see [2/7] and [3/7] presented
as a single patch, but it's not worth a re-roll.

> diff --git a/merge-ort.c b/merge-ort.c
> @@ -683,6 +683,30 @@ static void path_msg(struct merge_options *opt,
> +MAYBE_UNUSED
> +static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
> +{
> +       if (!pool)
> +               return xcalloc(count, size);
> +       return mem_pool_calloc(pool, count, size);
> +}
> +
> +MAYBE_UNUSED
> +static void *pool_alloc(struct mem_pool *pool, size_t size)
> +{
> +       if (!pool)
> +               return xmalloc(size);
> +       return mem_pool_alloc(pool, size);
> +}
> +
> +MAYBE_UNUSED
> +static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
> +{
> +       if (!pool)
> +               return xstrndup(str, len);
> +       return mem_pool_strndup(pool, str, len);
> +}
> +

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-23 12:54 ` [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
  2021-07-23 22:07   ` Eric Sunshine
@ 2021-07-26 14:36   ` Derrick Stolee
  2021-07-28 22:49     ` Elijah Newren
  1 sibling, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2021-07-26 14:36 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Jeff King, Elijah Newren

On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> We need functions which will either call
>     xmalloc, xcalloc, xstrndup
> or
>     mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
> depending on whether we have a non-NULL memory pool.  Add these
> functions; the next commit will make use of these.

I briefly considered that this should just be the way the
mem_pool_* methods work. It does rely on the caller knowing
to free() the allocated memory when their pool is NULL, so
perhaps such a universal change might be too much. What do
you think?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 0/7] Final optimization batch (#15): use memory pools
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-07-23 12:54 ` [PATCH 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
@ 2021-07-26 14:44 ` Derrick Stolee
  2021-07-28 22:52   ` Elijah Newren
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
  8 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2021-07-26 14:44 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Jeff King, Elijah Newren

On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
...
> === Basic Optimization idea ===
> 
> In this series, I make use of memory pools to get faster allocations and
> deallocations for many data structures that tend to all be deallocated at
> the same time anyway.

Makes sense. This is appropriate for a final optimization, since the gains
tend to be quite small.

> === Results ===
> 
> For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> performance work; instrument with trace2_region_* calls", 2020-10-28), the
> changes in just this series improves the performance as follows:
> 
>                      Before Series           After Series
> no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
> mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
> just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


But these are larger than I anticipated! Amazing.

> === Overall Results across all optimization work ===

I enjoyed reading this section. I'm excited to make ORT the default in
the microsoft/git fork and see how this improves the lives of our users.

> === Caveat ===
> 
> It may be worth noting, though, that my optimization numbers above for
> merge-ort use test-tool fast-rebase. git rebase -s ort on the three
> testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
> respectively).

The performance and behavior changes recommended here should definitely
be considered. However, the benefits still apply and at the moment users
do not expect immediate responses from 'git rebase' so we have some time
to approach with caution.

I only had one small question that is not even important to the
correctness of this series, so feel free to ignore it. The patches tell
a convincing story.

Just to be careful, have you taken the time to run the merge-ORT tests
with --valgrind?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-26 14:36   ` Derrick Stolee
@ 2021-07-28 22:49     ` Elijah Newren
  2021-07-29 15:26       ` Jeff King
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-07-28 22:49 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jeff King

On Mon, Jul 26, 2021 at 8:36 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > We need functions which will either call
> >     xmalloc, xcalloc, xstrndup
> > or
> >     mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
> > depending on whether we have a non-NULL memory pool.  Add these
> > functions; the next commit will make use of these.
>
> I briefly considered that this should just be the way the
> mem_pool_* methods work. It does rely on the caller knowing
> to free() the allocated memory when their pool is NULL, so
> perhaps such a universal change might be too much. What do
> you think?

That's interesting, but I'm worried it might be a bit much.  Do others
on the list have an opinion here?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 0/7] Final optimization batch (#15): use memory pools
  2021-07-26 14:44 ` [PATCH 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
@ 2021-07-28 22:52   ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-07-28 22:52 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jeff King

On Mon, Jul 26, 2021 at 8:44 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
> ...
> > === Basic Optimization idea ===
> >
> > In this series, I make use of memory pools to get faster allocations and
> > deallocations for many data structures that tend to all be deallocated at
> > the same time anyway.
>
> Makes sense. This is appropriate for a final optimization, since the gains
> tend to be quite small.
>
> > === Results ===
> >
> > For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> > performance work; instrument with trace2_region_* calls", 2020-10-28), the
> > changes in just this series improves the performance as follows:
> >
> >                      Before Series           After Series
> > no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
> > mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
> > just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms
>
>
> But these are larger than I anticipated! Amazing.
>
> > === Overall Results across all optimization work ===
>
> I enjoyed reading this section. I'm excited to make ORT the default in
> the microsoft/git fork and see how this improves the lives of our users.
>
> > === Caveat ===
> >
> > It may be worth noting, though, that my optimization numbers above for
> > merge-ort use test-tool fast-rebase. git rebase -s ort on the three
> > testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
> > respectively).
>
> The performance and behavior changes recommended here should definitely
> be considered. However, the benefits still apply and at the moment users
> do not expect immediate responses from 'git rebase' so we have some time
> to approach with caution.
>
> I only had one small question that is not even important to the
> correctness of this series, so feel free to ignore it. The patches tell
> a convincing story.
>
> Just to be careful, have you taken the time to run the merge-ORT tests
> with --valgrind?

Yes.  In addition to the testsuite, I also ran the testcases above
under valgrind (especially mega-renames) -- and with those testcases I
had the leak checker turned on.  It was somewhat surprising how much
slowdown I saw when I introduced some accidental memory leaks while
optimizing.  But it all runs clean with the patches I submitted.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-07-26 14:44 ` [PATCH 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
@ 2021-07-29  3:58 ` Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
                     ` (9 more replies)
  8 siblings, 10 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren

This series textually depends on en/ort-perf-batch-14, but the ideas are
orthogonal to it and orthogonal to previous series. It can be reviewed
independently.

This series is more about strmaps & memory pools than merge logic. CC'ing
Peff since he reviewed the strmap work[1], and that work included a number
of decisions that specifically had this series in mind.

Changes since v1, addressing Eric's feedback:

 * Fixed a comment that became out-of-date in patch 1
 * Swapped commits 2 and 3 so that one can better motivate the other.

Note: Stolee also had an interesting question about whether we should tweak
the mem_pool_*() API; he and I were both worried it was a bit much, so I've
left it out unless others on list chime in with their opinions on that
change.

[1]
https://lore.kernel.org/git/20201111200701.GB39046@coredump.intra.peff.net/

=== Basic Optimization idea ===

In this series, I make use of memory pools to get faster allocations and
deallocations for many data structures that tend to all be deallocated at
the same time anyway.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


=== Overall Results across all optimization work ===

This is my final prepared optimization series. It might be worth reviewing
how my optimizations fared overall, comparing the original merge-recursive
timings with three things: how much merge-recursive improved (as a
side-effect of optimizing merge-ort), how much improvement we would have
gotten from a hypothetical infinite parallelization of rename detection, and
what I achieved at the end with merge-ort:

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
no-renames:       18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
no-renames:         1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565


And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3


=== Caveat ===

It may be worth noting, though, that my optimization numbers above for
merge-ort use test-tool fast-rebase. git rebase -s ort on the three
testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
respectively). At this point, any further optimization work should go into
making a faster full-featured rebase by copying the ideas from fast-rebase:
avoid unnecessary process forking, avoid updating the index and working copy
until either the rebase is finished or you hit a conflict (and don't write
rebase metadata to disk until that point either), get rid of the glacially
slow revision walking of the upstream side of history (nuke
can_fast_forward(), make --reapply-cherry-picks the default) or at least
don't revision walk so many times (multiple calls to get_merge_bases in
can_fast_forward() plus a is_linear_history() walk, checking for upstream
cherry-picks, probably more), turn off per-commit hooks that probably should
have never been on anyway, etc.

Elijah Newren (7):
  diffcore-rename: use a mem_pool for exact rename detection's hashmap
  merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  merge-ort: set up a memory pool
  merge-ort: switch our strmaps over to using memory pools
  diffcore-rename, merge-ort: add wrapper functions for filepair
    alloc/dealloc
  merge-ort: store filepairs and filespecs in our mem_pool
  merge-ort: reuse path strings in pool_alloc_filespec

 diffcore-rename.c |  68 +++++++++++--
 diffcore.h        |   3 +
 merge-ort.c       | 247 +++++++++++++++++++++++++++++++++++-----------
 3 files changed, 252 insertions(+), 66 deletions(-)


base-commit: c9ada8369e6575be488028aae0f654422a9b1410
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-990%2Fnewren%2Fort-perf-batch-15-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-990/newren/ort-perf-batch-15-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/990

Range-diff vs v1:

 1:  9f8ab62b842 ! 1:  ea08b34d29b diffcore-rename: use a mem_pool for exact rename detection's hashmap
     @@ diffcore-rename.c: static int find_exact_renames(struct diff_options *options)
       				  rename_src[i].p->one);
       
      @@ diffcore-rename.c: static int find_exact_renames(struct diff_options *options)
     + 	for (i = 0; i < rename_dst_nr; i++)
       		renames += find_identical_files(&file_table, i, options);
       
     - 	/* Free the hash data structure and entries */
     +-	/* Free the hash data structure and entries */
      -	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
     ++	/* Free the hash data structure (entries will be freed with the pool) */
      +	hashmap_clear(&file_table);
       
       	return renames;
 3:  e30b8c8fea1 ! 2:  fdfc2b93ba4 merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
     @@ Commit message
          or
              mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
          depending on whether we have a non-NULL memory pool.  Add these
     -    functions; the next commit will make use of these.
     +    functions; a subsequent commit will make use of these.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
 2:  77367a69daa = 3:  c7150869107 merge-ort: set up a memory pool
 4:  8231c8e34cd = 4:  dd8839b2843 merge-ort: switch our strmaps over to using memory pools
 5:  2db932bc601 = 5:  560800a80ef diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
 6:  629d042884a = 6:  94d60c8a476 merge-ort: store filepairs and filespecs in our mem_pool
 7:  17aa0a74849 = 7:  fda885dabe6 merge-ort: reuse path strings in pool_alloc_filespec

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
@ 2021-07-29  3:58   ` Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 2/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Exact rename detection, via insert_file_table(), uses a hashmap to store
files by oid.  Use a mem_pool for the hashmap entries so these can all be
allocated and deallocated together.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      204.2  ms ±  3.0  ms   202.5  ms ±  3.2  ms
    mega-renames:      1.076 s ±  0.015 s     1.072 s ±  0.012 s
    just-one-mega:   364.1  ms ±  7.0  ms   357.3  ms ±  3.9  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 4ef0459cfb5..73d884099eb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -317,10 +317,11 @@ static int find_identical_files(struct hashmap *srcs,
 }
 
 static void insert_file_table(struct repository *r,
+			      struct mem_pool *pool,
 			      struct hashmap *table, int index,
 			      struct diff_filespec *filespec)
 {
-	struct file_similarity *entry = xmalloc(sizeof(*entry));
+	struct file_similarity *entry = mem_pool_alloc(pool, sizeof(*entry));
 
 	entry->index = index;
 	entry->filespec = filespec;
@@ -336,7 +337,8 @@ static void insert_file_table(struct repository *r,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(struct diff_options *options)
+static int find_exact_renames(struct diff_options *options,
+			      struct mem_pool *pool)
 {
 	int i, renames = 0;
 	struct hashmap file_table;
@@ -346,7 +348,7 @@ static int find_exact_renames(struct diff_options *options)
 	 */
 	hashmap_init(&file_table, NULL, NULL, rename_src_nr);
 	for (i = rename_src_nr-1; i >= 0; i--)
-		insert_file_table(options->repo,
+		insert_file_table(options->repo, pool,
 				  &file_table, i,
 				  rename_src[i].p->one);
 
@@ -354,8 +356,8 @@ static int find_exact_renames(struct diff_options *options)
 	for (i = 0; i < rename_dst_nr; i++)
 		renames += find_identical_files(&file_table, i, options);
 
-	/* Free the hash data structure and entries */
-	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
+	/* Free the hash data structure (entries will be freed with the pool) */
+	hashmap_clear(&file_table);
 
 	return renames;
 }
@@ -1341,6 +1343,7 @@ void diffcore_rename_extended(struct diff_options *options,
 	int num_destinations, dst_cnt;
 	int num_sources, want_copies;
 	struct progress *progress = NULL;
+	struct mem_pool local_pool;
 	struct dir_rename_info info;
 	struct diff_populate_filespec_options dpf_options = {
 		.check_binary = 0,
@@ -1409,11 +1412,18 @@ void diffcore_rename_extended(struct diff_options *options,
 		goto cleanup; /* nothing to do */
 
 	trace2_region_enter("diff", "exact renames", options->repo);
+	mem_pool_init(&local_pool, 32*1024);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames(options);
+	rename_count = find_exact_renames(options, &local_pool);
+	/*
+	 * Discard local_pool immediately instead of at "cleanup:" in order
+	 * to reduce maximum memory usage; inexact rename detection uses up
+	 * a fair amount of memory, and mem_pools can too.
+	 */
+	mem_pool_discard(&local_pool, 0);
 	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 2/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
@ 2021-07-29  3:58   ` Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 3/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

We need functions which will either call
    xmalloc, xcalloc, xstrndup
or
    mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
depending on whether we have a non-NULL memory pool.  Add these
functions; a subsequent commit will make use of these.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index e361443087a..39ddc9b9f2f 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -664,6 +664,30 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
+{
+	if (!pool)
+		return xcalloc(count, size);
+	return mem_pool_calloc(pool, count, size);
+}
+
+MAYBE_UNUSED
+static void *pool_alloc(struct mem_pool *pool, size_t size)
+{
+	if (!pool)
+		return xmalloc(size);
+	return mem_pool_alloc(pool, size);
+}
+
+MAYBE_UNUSED
+static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
+{
+	if (!pool)
+		return xstrndup(str, len);
+	return mem_pool_strndup(pool, str, len);
+}
+
 /* add a string to a strbuf, but converting "/" to "_" */
 static void add_flattened_path(struct strbuf *out, const char *s)
 {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 3/7] merge-ort: set up a memory pool
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 2/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
@ 2021-07-29  3:58   ` Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort has a lot of data structures, and they all tend to be freed
together in clear_or_reinit_internal_opts().  Set up a memory pool to
allow us to make these allocations and deallocations faster.  Future
commits will adjust various callers to make use of this memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 39ddc9b9f2f..2bca4b71f2a 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -37,6 +37,8 @@
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
 
+#define USE_MEMORY_POOL 1 /* faster, but obscures memory leak hunting */
+
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
  * indices refer to one of the sides of the three-way merge.  This is so
@@ -339,6 +341,17 @@ struct merge_options_internal {
 	 */
 	struct strmap conflicted;
 
+	/*
+	 * pool: memory pool for fast allocation/deallocation
+	 *
+	 * We allocate room for lots of filenames and auxiliary data
+	 * structures in merge_options_internal, and it tends to all be
+	 * freed together too.  Using a memory pool for these provides a
+	 * nice speedup.
+	 */
+	struct mem_pool internal_pool;
+	struct mem_pool *pool; /* NULL, or pointer to internal_pool */
+
 	/*
 	 * paths_to_free: additional list of strings to free
 	 *
@@ -603,6 +616,12 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
+#if USE_MEMORY_POOL
+	mem_pool_discard(&opti->internal_pool, 0);
+	if (!reinitialize)
+		opti->pool = NULL;
+#endif
+
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
 	renames->callback_data_nr = renames->callback_data_alloc = 0;
@@ -4368,6 +4387,12 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of various renames fields */
 	renames = &opt->priv->renames;
+#if USE_MEMORY_POOL
+	mem_pool_init(&opt->priv->internal_pool, 0);
+	opt->priv->pool = &opt->priv->internal_pool;
+#else
+	opt->priv->pool = NULL;
+#endif
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
 					    NOT_RELEVANT, NULL, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-07-29  3:58   ` [PATCH v2 3/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
@ 2021-07-29  3:58   ` Elijah Newren via GitGitGadget
  2021-07-29 15:28     ` Jeff King
  2021-07-29  3:58   ` [PATCH v2 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For all the strmaps (including strintmaps and strsets) whose memory is
unconditionally freed as part of clear_or_reinit_internal_opts(), switch
them over to using our new memory pool.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      202.5  ms ±  3.2  ms    198.1 ms ±  2.6 ms
    mega-renames:      1.072 s ±  0.012 s    715.8 ms ±  4.0 ms
    just-one-mega:   357.3  ms ±  3.9  ms    276.8 ms ±  4.2 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 125 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 75 insertions(+), 50 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 2bca4b71f2a..5fd2a4ccd35 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	void (*strset_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
-	/*
-	 * We marked opti->paths with strdup_strings = 0, so that we
-	 * wouldn't have to make another copy of the fullpath created by
-	 * make_traverse_path from setup_path_info().  But, now that we've
-	 * used it and have no other references to these strings, it is time
-	 * to deallocate them.
-	 */
-	free_strmap_strings(&opti->paths);
-	strmap_func(&opti->paths, 1);
+	if (opti->pool)
+		strmap_func(&opti->paths, 0);
+	else {
+		/*
+		 * We marked opti->paths with strdup_strings = 0, so that
+		 * we wouldn't have to make another copy of the fullpath
+		 * created by make_traverse_path from setup_path_info().
+		 * But, now that we've used it and have no other references
+		 * to these strings, it is time to deallocate them.
+		 */
+		free_strmap_strings(&opti->paths);
+		strmap_func(&opti->paths, 1);
+	}
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
@@ -556,16 +560,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 */
 	strmap_func(&opti->conflicted, 0);
 
-	/*
-	 * opti->paths_to_free is similar to opti->paths; we created it with
-	 * strdup_strings = 0 to avoid making _another_ copy of the fullpath
-	 * but now that we've used it and have no other references to these
-	 * strings, it is time to deallocate them.  We do so by temporarily
-	 * setting strdup_strings to 1.
-	 */
-	opti->paths_to_free.strdup_strings = 1;
-	string_list_clear(&opti->paths_to_free, 0);
-	opti->paths_to_free.strdup_strings = 0;
+	if (!opti->pool) {
+		/*
+		 * opti->paths_to_free is similar to opti->paths; we
+		 * created it with strdup_strings = 0 to avoid making
+		 * _another_ copy of the fullpath but now that we've used
+		 * it and have no other references to these strings, it is
+		 * time to deallocate them.  We do so by temporarily
+		 * setting strdup_strings to 1.
+		 */
+		opti->paths_to_free.strdup_strings = 1;
+		string_list_clear(&opti->paths_to_free, 0);
+		opti->paths_to_free.strdup_strings = 0;
+	}
 
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
@@ -683,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
@@ -691,7 +697,6 @@ static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 	return mem_pool_calloc(pool, count, size);
 }
 
-MAYBE_UNUSED
 static void *pool_alloc(struct mem_pool *pool, size_t size)
 {
 	if (!pool)
@@ -699,7 +704,6 @@ static void *pool_alloc(struct mem_pool *pool, size_t size)
 	return mem_pool_alloc(pool, size);
 }
 
-MAYBE_UNUSED
 static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
 {
 	if (!pool)
@@ -835,8 +839,9 @@ static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
-				   sizeof(struct conflict_info));
+	mi = pool_calloc(opt->priv->pool, 1,
+			 resolved ? sizeof(struct merged_info) :
+				    sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
 	mi->basename_offset = current_dir_name_len;
 	mi->clean = !!resolved;
@@ -1128,7 +1133,7 @@ static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = xmalloc(len + 1);
+	fullpath = pool_alloc(opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -1383,7 +1388,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 		copy = renames->deferred[side].possible_trivial_merges;
 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
 					    0,
-					    NULL,
+					    opt->priv->pool,
 					    0);
 		strintmap_for_each_entry(&copy, &iter, entry) {
 			const char *path = entry->key;
@@ -2335,12 +2340,21 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 	VERIFY_CI(ci);
 
 	/* Find parent directories missing from opt->priv->paths */
-	cur_path = new_path;
+	if (opt->priv->pool) {
+		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
+		free((char*)new_path);
+		new_path = (char *)cur_path;
+	} else {
+		cur_path = new_path;
+	}
+
 	while (1) {
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = xstrndup(cur_path, last_slash - cur_path);
+			parent_name = pool_strndup(opt->priv->pool,
+						   cur_path,
+						   last_slash - cur_path);
 		} else {
 			parent_name = opt->priv->toplevel_dir;
 			break;
@@ -2349,7 +2363,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Look it up in opt->priv->paths */
 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
 		if (entry) {
-			free((char*)parent_name);
+			if (!opt->priv->pool)
+				free((char*)parent_name);
 			parent_name = entry->key; /* reuse known pointer */
 			break;
 		}
@@ -2376,12 +2391,15 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		parent_name = cur_dir;
 	}
 
-	/*
-	 * We are removing old_path from opt->priv->paths.  old_path also will
-	 * eventually need to be freed, but it may still be used by e.g.
-	 * ci->pathnames.  So, store it in another string-list for now.
-	 */
-	string_list_append(&opt->priv->paths_to_free, old_path);
+	if (!opt->priv->pool) {
+		/*
+		 * We are removing old_path from opt->priv->paths.
+		 * old_path also will eventually need to be freed, but it
+		 * may still be used by e.g.  ci->pathnames.  So, store it
+		 * in another string-list for now.
+		 */
+		string_list_append(&opt->priv->paths_to_free, old_path);
+	}
 
 	assert(ci->filemask == 2 || ci->filemask == 4);
 	assert(ci->dirmask == 0);
@@ -2416,7 +2434,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		new_ci->stages[index].mode = ci->stages[index].mode;
 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
 
-		free(ci);
+		if (!opt->priv->pool)
+			free(ci);
 		ci = new_ci;
 	}
 
@@ -3623,7 +3642,8 @@ static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		CALLOC_ARRAY(new_ci, 1);
+		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
 		 * want any directory mode/oid values copied, but we'll zero
@@ -3713,7 +3733,7 @@ static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = xmalloc(sizeof(*new_ci));
+			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
 				rename_a = 1;
@@ -3777,12 +3797,14 @@ static void process_entry(struct merge_options *opt,
 				strmap_remove(&opt->priv->paths, path, 0);
 				/*
 				 * We removed path from opt->priv->paths.  path
-				 * will also eventually need to be freed, but
-				 * it may still be used by e.g.  ci->pathnames.
-				 * So, store it in another string-list for now.
+				 * will also eventually need to be freed if not
+				 * part of a memory pool...but it may still be
+				 * used by e.g. ci->pathnames.  So, store it in
+				 * another string-list for now in that case.
 				 */
-				string_list_append(&opt->priv->paths_to_free,
-						   path);
+				if (!opt->priv->pool)
+					string_list_append(&opt->priv->paths_to_free,
+							   path);
 			}
 
 			/*
@@ -4322,6 +4344,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
 	struct rename_info *renames;
 	int i;
+	struct mem_pool *pool = NULL;
 
 	/* Sanity checks on opt */
 	trace2_region_enter("merge", "sanity checks", opt->repo);
@@ -4393,9 +4416,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 #else
 	opt->priv->pool = NULL;
 #endif
+	pool = opt->priv->pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
-					    NOT_RELEVANT, NULL, 0);
+					    NOT_RELEVANT, pool, 0);
 		strmap_init_with_options(&renames->dir_rename_count[i],
 					 NULL, 1);
 		strmap_init_with_options(&renames->dir_renames[i],
@@ -4409,7 +4433,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 		 */
 		strintmap_init_with_options(&renames->relevant_sources[i],
 					    -1 /* explicitly invalid */,
-					    NULL, 0);
+					    pool, 0);
 		strmap_init_with_options(&renames->cached_pairs[i],
 					 NULL, 1);
 		strset_init_with_options(&renames->cached_irrelevant[i],
@@ -4419,9 +4443,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->deferred[i].possible_trivial_merges,
-					    0, NULL, 0);
+					    0, pool, 0);
 		strset_init_with_options(&renames->deferred[i].target_dirs,
-					 NULL, 1);
+					 pool, 1);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 
@@ -4434,9 +4458,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * In contrast, conflicted just has a subset of keys from paths, so
 	 * we don't want to free those (it'd be a duplicate free).
 	 */
-	strmap_init_with_options(&opt->priv->paths, NULL, 0);
-	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
-	string_list_init(&opt->priv->paths_to_free, 0);
+	strmap_init_with_options(&opt->priv->paths, pool, 0);
+	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
+	if (!opt->priv->pool)
+		string_list_init(&opt->priv->paths_to_free, 0);
 
 	/*
 	 * keys & strbufs in output will sometimes need to outlive "paths",
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-07-29  3:58   ` [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
@ 2021-07-29  3:58   ` Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

We want to be able to allocate filespecs and filepairs using a mem_pool.
However, filespec data will still remain outside the pool (perhaps in
the future we could plumb the pool through the various diff APIs to
allocate the filespec data too, but for now we are limiting the scope).
Add some extra functions to allocate these appropriately based on the
non-NULL-ness of opt->priv->pool, as well as some extra functions to
handle correctly deallocating the relevant parts of them.  A future
commit will make use of these new functions.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 41 +++++++++++++++++++++++++++++++++++++++++
 diffcore.h        |  2 ++
 merge-ort.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 73d884099eb..5bc559f79e9 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1328,6 +1328,47 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
 	rename_src_nr = new_num_src;
 }
 
+static void free_filespec_data(struct diff_filespec *spec)
+{
+	if (!--spec->count)
+		diff_free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+static void pool_free_filespec(struct mem_pool *pool,
+			       struct diff_filespec *spec)
+{
+	if (!pool) {
+		free_filespec(spec);
+		return;
+	}
+
+	/*
+	 * Similar to free_filespec(), but only frees the data.  The spec
+	 * itself was allocated in the pool and should not be individually
+	 * freed.
+	 */
+	free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p)
+{
+	if (!pool) {
+		diff_free_filepair(p);
+		return;
+	}
+
+	/*
+	 * Similar to diff_free_filepair() but only frees the data from the
+	 * filespecs; not the filespecs or the filepair which were
+	 * allocated from the pool.
+	 */
+	free_filespec_data(p->one);
+	free_filespec_data(p->two);
+}
+
 void diffcore_rename_extended(struct diff_options *options,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
diff --git a/diffcore.h b/diffcore.h
index 533b30e21e7..b58ee6b1934 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -127,6 +127,8 @@ struct diff_filepair {
 #define DIFF_PAIR_MODE_CHANGED(p) ((p)->one->mode != (p)->two->mode)
 
 void diff_free_filepair(struct diff_filepair *);
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p);
 
 int diff_unmodified_pair(struct diff_filepair *);
 
diff --git a/merge-ort.c b/merge-ort.c
index 5fd2a4ccd35..59428e45884 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,6 +690,48 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
+						 const char *path)
+{
+	struct diff_filespec *spec;
+	size_t len;
+
+	if (!pool)
+		return alloc_filespec(path);
+
+	/* Same code as alloc_filespec, except allocate from pool */
+	len = strlen(path);
+
+	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
+	memcpy(spec+1, path, len);
+	spec->path = (void*)(spec+1);
+
+	spec->count = 1;
+	spec->is_binary = -1;
+	return spec;
+}
+
+MAYBE_UNUSED
+static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
+					     struct diff_queue_struct *queue,
+					     struct diff_filespec *one,
+					     struct diff_filespec *two)
+{
+	struct diff_filepair *dp;
+
+	if (!pool)
+		return diff_queue(queue, one, two);
+
+	/* Same code as diff_queue, except allocate from pool */
+	dp = mem_pool_calloc(pool, 1, sizeof(*dp));
+	dp->one = one;
+	dp->two = two;
+	if (queue)
+		diff_q(queue, dp);
+	return dp;
+}
+
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 6/7] merge-ort: store filepairs and filespecs in our mem_pool
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-07-29  3:58   ` [PATCH v2 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
@ 2021-07-29  3:58   ` Elijah Newren via GitGitGadget
  2021-07-29  3:58   ` [PATCH v2 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.1 ms ±  2.6 ms     198.5 ms ±  3.4 ms
    mega-renames:     715.8 ms ±  4.0 ms     679.1 ms ±  5.6 ms
    just-one-mega:    276.8 ms ±  4.2 ms     271.9 ms ±  2.8 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  9 ++++-----
 diffcore.h        |  1 +
 merge-ort.c       | 26 ++++++++++++++------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 5bc559f79e9..7e6b3e1b143 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1334,7 +1334,6 @@ static void free_filespec_data(struct diff_filespec *spec)
 		diff_free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 static void pool_free_filespec(struct mem_pool *pool,
 			       struct diff_filespec *spec)
 {
@@ -1351,7 +1350,6 @@ static void pool_free_filespec(struct mem_pool *pool,
 	free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 void pool_diff_free_filepair(struct mem_pool *pool,
 			     struct diff_filepair *p)
 {
@@ -1370,6 +1368,7 @@ void pool_diff_free_filepair(struct mem_pool *pool,
 }
 
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
@@ -1683,7 +1682,7 @@ void diffcore_rename_extended(struct diff_options *options,
 			pair_to_free = p;
 
 		if (pair_to_free)
-			diff_free_filepair(pair_to_free);
+			pool_diff_free_filepair(pool, pair_to_free);
 	}
 	diff_debug_queue("done copying original", &outq);
 
@@ -1693,7 +1692,7 @@ void diffcore_rename_extended(struct diff_options *options,
 
 	for (i = 0; i < rename_dst_nr; i++)
 		if (rename_dst[i].filespec_to_free)
-			free_filespec(rename_dst[i].filespec_to_free);
+			pool_free_filespec(pool, rename_dst[i].filespec_to_free);
 
 	cleanup_dir_rename_info(&info, dirs_removed, dir_rename_count != NULL);
 	FREE_AND_NULL(rename_dst);
@@ -1710,5 +1709,5 @@ void diffcore_rename_extended(struct diff_options *options,
 
 void diffcore_rename(struct diff_options *options)
 {
-	diffcore_rename_extended(options, NULL, NULL, NULL, NULL);
+	diffcore_rename_extended(options, NULL, NULL, NULL, NULL, NULL);
 }
diff --git a/diffcore.h b/diffcore.h
index b58ee6b1934..badc2261c20 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -181,6 +181,7 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count);
 void diffcore_break(struct repository *, int);
 void diffcore_rename(struct diff_options *);
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
diff --git a/merge-ort.c b/merge-ort.c
index 59428e45884..d29c7fe8a30 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
@@ -712,7 +711,6 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 	return spec;
 }
 
-MAYBE_UNUSED
 static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 					     struct diff_queue_struct *queue,
 					     struct diff_filespec *one,
@@ -930,6 +928,7 @@ static void add_pair(struct merge_options *opt,
 		     unsigned dir_rename_mask)
 {
 	struct diff_filespec *one, *two;
+	struct mem_pool *pool = opt->priv->pool;
 	struct rename_info *renames = &opt->priv->renames;
 	int names_idx = is_add ? side : 0;
 
@@ -980,11 +979,11 @@ static void add_pair(struct merge_options *opt,
 			return;
 	}
 
-	one = alloc_filespec(pathname);
-	two = alloc_filespec(pathname);
+	one = pool_alloc_filespec(pool, pathname);
+	two = pool_alloc_filespec(pool, pathname);
 	fill_filespec(is_add ? two : one,
 		      &names[names_idx].oid, 1, names[names_idx].mode);
-	diff_queue(&renames->pairs[side], one, two);
+	pool_diff_queue(pool, &renames->pairs[side], one, two);
 }
 
 static void collect_rename_info(struct merge_options *opt,
@@ -2893,6 +2892,7 @@ static void use_cached_pairs(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
+	struct mem_pool *pool = opt->priv->pool;
 
 	/*
 	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
@@ -2906,9 +2906,9 @@ static void use_cached_pairs(struct merge_options *opt,
 			new_name = old_name;
 
 		/* We don't care about oid/mode, only filenames and status */
-		one = alloc_filespec(old_name);
-		two = alloc_filespec(new_name);
-		diff_queue(pairs, one, two);
+		one = pool_alloc_filespec(pool, old_name);
+		two = pool_alloc_filespec(pool, new_name);
+		pool_diff_queue(pool, pairs, one, two);
 		pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
 	}
 }
@@ -3016,6 +3016,7 @@ static int detect_regular_renames(struct merge_options *opt,
 	diff_queued_diff = renames->pairs[side_index];
 	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diffcore_rename_extended(&diff_opts,
+				 opt->priv->pool,
 				 &renames->relevant_sources[side_index],
 				 &renames->dirs_removed[side_index],
 				 &renames->dir_rename_count[side_index],
@@ -3066,7 +3067,7 @@ static int collect_renames(struct merge_options *opt,
 
 		if (p->status != 'A' && p->status != 'R') {
 			possibly_cache_new_pair(renames, p, side_index, NULL);
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3079,7 +3080,7 @@ static int collect_renames(struct merge_options *opt,
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
 		if (p->status != 'R' && !new_path) {
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3197,7 +3198,7 @@ cleanup:
 		side_pairs = &renames->pairs[s];
 		for (i = 0; i < side_pairs->nr; ++i) {
 			struct diff_filepair *p = side_pairs->queue[i];
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 		}
 	}
 
@@ -3210,7 +3211,8 @@ simple_cleanup:
 	if (combined.nr) {
 		int i;
 		for (i = 0; i < combined.nr; i++)
-			diff_free_filepair(combined.queue[i]);
+			pool_diff_free_filepair(opt->priv->pool,
+						combined.queue[i]);
 		free(combined.queue);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 7/7] merge-ort: reuse path strings in pool_alloc_filespec
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-07-29  3:58   ` [PATCH v2 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
@ 2021-07-29  3:58   ` Elijah Newren via GitGitGadget
  2021-07-29 14:58   ` [PATCH v2 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-29  3:58 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

pool_alloc_filespec() was written so that the code when pool != NULL
mimicked the code from alloc_filespec(), which including allocating
enough extra space for the path and then copying it.  However, the path
passed to pool_alloc_filespec() is always going to already be in the
same memory pool, so we may as well reuse it instead of copying it.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.5 ms ±  3.4 ms     198.3 ms ±  2.9 ms
    mega-renames:     679.1 ms ±  5.6 ms     661.8 ms ±  5.9 ms
    just-one-mega:    271.9 ms ±  2.8 ms     264.6 ms ±  2.5 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index d29c7fe8a30..0fb942692a7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -694,17 +694,13 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
 	struct diff_filespec *spec;
-	size_t len;
 
 	if (!pool)
 		return alloc_filespec(path);
 
-	/* Same code as alloc_filespec, except allocate from pool */
-	len = strlen(path);
-
-	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
-	memcpy(spec+1, path, len);
-	spec->path = (void*)(spec+1);
+	/* Similar to alloc_filespec, but allocate from pool and reuse path */
+	spec = mem_pool_calloc(pool, 1, sizeof(*spec));
+	spec->path = (char*)path; /* spec won't modify it */
 
 	spec->count = 1;
 	spec->is_binary = -1;
@@ -2904,6 +2900,25 @@ static void use_cached_pairs(struct merge_options *opt,
 		const char *new_name = entry->value;
 		if (!new_name)
 			new_name = old_name;
+		if (pool) {
+			/*
+			 * cached_pairs has _copies* of old_name and new_name,
+			 * because it has to persist across merges.  When
+			 *   pool != NULL
+			 * pool_alloc_filespec() will just re-use the existing
+			 * filenames, which will also get re-used by
+			 * opt->priv->paths if they become renames, and then
+			 * get freed at the end of the merge, leaving the copy
+			 * in cached_pairs dangling.  Avoid this by making a
+			 * copy here.
+			 *
+			 * When pool == NULL, pool_alloc_filespec() calls
+			 * alloc_filespec(), which makes a copy; we don't want
+			 * to add another.
+			 */
+			old_name = mem_pool_strdup(pool, old_name);
+			new_name = mem_pool_strdup(pool, new_name);
+		}
 
 		/* We don't care about oid/mode, only filenames and status */
 		one = pool_alloc_filespec(pool, old_name);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-07-29  3:58   ` [PATCH v2 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
@ 2021-07-29 14:58   ` Derrick Stolee
  2021-07-29 16:20   ` Jeff King
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-07-29 14:58 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Jeff King, Eric Sunshine, Elijah Newren

On 7/28/2021 11:58 PM, Elijah Newren via GitGitGadget wrote:
> This series textually depends on en/ort-perf-batch-14, but the ideas are
> orthogonal to it and orthogonal to previous series. It can be reviewed
> independently.
> 
> This series is more about strmaps & memory pools than merge logic. CC'ing
> Peff since he reviewed the strmap work[1], and that work included a number
> of decisions that specifically had this series in mind.
> 
> Changes since v1, addressing Eric's feedback:
> 
>  * Fixed a comment that became out-of-date in patch 1
>  * Swapped commits 2 and 3 so that one can better motivate the other.

Changes look good to me. Thanks!
 
> Note: Stolee also had an interesting question about whether we should tweak
> the mem_pool_*() API; he and I were both worried it was a bit much, so I've
> left it out unless others on list chime in with their opinions on that
> change.

This was mostly a thought experiment on my part. There is no need to
decide one way or another in this series since it would be easy to
adapt what you have here to match a change to the mem_pool_*() API
if we thought that was a good idea. (Still not sure it is.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-28 22:49     ` Elijah Newren
@ 2021-07-29 15:26       ` Jeff King
  2021-07-30  2:27         ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Jeff King @ 2021-07-29 15:26 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

On Wed, Jul 28, 2021 at 04:49:18PM -0600, Elijah Newren wrote:

> On Mon, Jul 26, 2021 at 8:36 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
> > > From: Elijah Newren <newren@gmail.com>
> > >
> > > We need functions which will either call
> > >     xmalloc, xcalloc, xstrndup
> > > or
> > >     mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
> > > depending on whether we have a non-NULL memory pool.  Add these
> > > functions; the next commit will make use of these.
> >
> > I briefly considered that this should just be the way the
> > mem_pool_* methods work. It does rely on the caller knowing
> > to free() the allocated memory when their pool is NULL, so
> > perhaps such a universal change might be too much. What do
> > you think?
> 
> That's interesting, but I'm worried it might be a bit much.  Do others
> on the list have an opinion here?

FWIW, I had the same thought. You can also provide a helper to make the
freeing side nicer:

  static void mem_pool_free(struct mem_pool *m, void *ptr)
  {
	if (m)
		return; /* will be freed when pool frees */
	free(ptr);
  }

We do something similar with unuse_commit_buffer(), where the caller
isn't aware of we pulled the buffer from cache or allocated it
especially for them.

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-29  3:58   ` [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
@ 2021-07-29 15:28     ` Jeff King
  2021-07-29 18:37       ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Jeff King @ 2021-07-29 15:28 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee

On Thu, Jul 29, 2021 at 03:58:38AM +0000, Elijah Newren via GitGitGadget wrote:

> diff --git a/merge-ort.c b/merge-ort.c
> index 2bca4b71f2a..5fd2a4ccd35 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>  	void (*strset_func)(struct strset *) =
>  		reinitialize ? strset_partial_clear : strset_clear;
>  
> -	/*
> -	 * We marked opti->paths with strdup_strings = 0, so that we
> -	 * wouldn't have to make another copy of the fullpath created by
> -	 * make_traverse_path from setup_path_info().  But, now that we've
> -	 * used it and have no other references to these strings, it is time
> -	 * to deallocate them.
> -	 */
> -	free_strmap_strings(&opti->paths);
> -	strmap_func(&opti->paths, 1);
> +	if (opti->pool)
> +		strmap_func(&opti->paths, 0);

This isn't new in your patch here, but I did scratch my head a bit over
what "strmap_func" is. It's a bit less confusing if you read the whole
function (as opposed to a diff), since then you're more likely to see
the definition. But something like "strmap_clear_func()" would have been
a lot less confusing.

Arguably, the existence of these function indirections is perhaps a sign
that the strmap API should provide a version of the clear functions that
takes "partial / not-partial" as a parameter.

(Again, not really part of this patch series, but I hadn't looked at
some of the earlier optimization steps).

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-07-29 14:58   ` [PATCH v2 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
@ 2021-07-29 16:20   ` Jeff King
  2021-07-29 16:23     ` Jeff King
  2021-07-29 20:46     ` Elijah Newren
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
  9 siblings, 2 replies; 65+ messages in thread
From: Jeff King @ 2021-07-29 16:20 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee

On Thu, Jul 29, 2021 at 03:58:34AM +0000, Elijah Newren via GitGitGadget wrote:

> This series is more about strmaps & memory pools than merge logic. CC'ing
> Peff since he reviewed the strmap work[1], and that work included a number
> of decisions that specifically had this series in mind.

I haven't been following the other optimization threads very closely,
but I'll try to give my general impressions.

> === Basic Optimization idea ===
> 
> In this series, I make use of memory pools to get faster allocations and
> deallocations for many data structures that tend to all be deallocated at
> the same time anyway.
> 
> === Results ===
> 
> For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> performance work; instrument with trace2_region_* calls", 2020-10-28), the
> changes in just this series improves the performance as follows:
> 
>                      Before Series           After Series
> no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
> mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
> just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms

Pretty good results for the mega-renames case. I do wonder how much this
matters in practice. That case is intentionally stressing the system,
though I guess it's not too far-fetched (it's mostly a big directory
rename). However, just "git checkout" across the rename already takes
more than a second. So on the one hand, 400ms isn't nothing. On the
other, I doubt anybody is likely to notice in the grand scheme of
things.

And we're paying a non-trivial cost in code complexity to do it (though
I do think you've done an admirable job of making that cost as low as
possible). Dropping the USE_MEMORY_POOL flag and just always using a
pool would make a lot of that complexity go away. I understand how it
makes leak hunting harder, but I think simpler code would probably be
worth the tradeoff (and in a sense, there _aren't_ leaks in an
always-pool world; we're holding on to all of the memory through the
whole operation).

I assume your tests are just done using the regular glibc allocator. I
also wondered how plugging in a better allocator might fare. Here are
timings I did of your mega-renames case with three binaries: one built
with USE_MEMORY_POOL set to 0, one with it set to 1, and one with it set
to 0 but adding "-ltcmalloc" to EXTLIBS via config.mak.

  $ hyperfine \
      -p 'git checkout hwmon-updates &&
          git reset --hard fd8bdb23b91876ac1e624337bb88dc1dcc21d67e &&
          git checkout 5.4-renames^0' \
      -L version nopool,pool,tcmalloc \
      './test-tool.{version} fast-rebase --onto HEAD base hwmon-updates'
  
  Benchmark #1: ./test-tool.nopool fast-rebase --onto HEAD base hwmon-updates
    Time (mean ± σ):     921.1 ms ± 146.0 ms    [User: 843.0 ms, System: 77.5 ms]
    Range (min … max):   660.9 ms … 1112.2 ms    10 runs
   
  Benchmark #2: ./test-tool.pool fast-rebase --onto HEAD base hwmon-updates
    Time (mean ± σ):     635.4 ms ± 125.5 ms    [User: 563.7 ms, System: 71.3 ms]
    Range (min … max):   496.8 ms … 856.7 ms    10 runs
   
  Benchmark #3: ./test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates
    Time (mean ± σ):     727.3 ms ± 139.9 ms    [User: 654.1 ms, System: 72.9 ms]
    Range (min … max):   476.3 ms … 900.5 ms    10 runs
   
  Summary
    './test-tool.pool fast-rebase --onto HEAD base hwmon-updates' ran
      1.14 ± 0.32 times faster than './test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates'
      1.45 ± 0.37 times faster than './test-tool.nopool fast-rebase --onto HEAD base hwmon-updates'

The pool allocator does come out ahead when comparing means, but the
improvement is within the noise (and the fastest run was actually with
tcmalloc).

I was also curious about peak heap usage. According to massif, the pool
version peaks at ~800k extra (out of 82MB), which is negligible. Plus it
has fewer overall allocations, so it seems to actually save 4-5MB in
malloc overhead (though I would imagine that varies between allocators;
I'm just going from massif numbers here).

So...I dunno. It's hard to assess the real-world impact of the speedup,
compared to the complexity cost. Ultimately, this is changing code that
you wrote and will probably maintain. So I'd leave the final decision
for that tradeoff to you. I'm just injecting some thoughts and numbers. :)

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29 16:20   ` Jeff King
@ 2021-07-29 16:23     ` Jeff King
  2021-07-29 19:46       ` Junio C Hamano
  2021-07-29 20:46     ` Elijah Newren
  1 sibling, 1 reply; 65+ messages in thread
From: Jeff King @ 2021-07-29 16:23 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee

On Thu, Jul 29, 2021 at 12:20:13PM -0400, Jeff King wrote:

> I assume your tests are just done using the regular glibc allocator. I
> also wondered how plugging in a better allocator might fare. Here are
> timings I did of your mega-renames case with three binaries: one built
> with USE_MEMORY_POOL set to 0, one with it set to 1, and one with it set
> to 0 but adding "-ltcmalloc" to EXTLIBS via config.mak.

Oh, btw, I wasn't able to apply your series from the list on top of
en/ort-perf-batch-14 (there are some problems in patch 4, and "am -3"
says my clone of git.git is missing some of the pre-image sha1s). I
fetched ort-perf-batch-15 from https://github.com/newren/git and timed
that, which I imagine is the same. But you may need to tweak the patches
so that Junio can pick them up.

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-29 15:28     ` Jeff King
@ 2021-07-29 18:37       ` Elijah Newren
  2021-07-29 20:09         ` Jeff King
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-07-29 18:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee

On Thu, Jul 29, 2021 at 9:29 AM Jeff King <peff@peff.net> wrote:
>
> On Thu, Jul 29, 2021 at 03:58:38AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 2bca4b71f2a..5fd2a4ccd35 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
> >       void (*strset_func)(struct strset *) =
> >               reinitialize ? strset_partial_clear : strset_clear;
> >
> > -     /*
> > -      * We marked opti->paths with strdup_strings = 0, so that we
> > -      * wouldn't have to make another copy of the fullpath created by
> > -      * make_traverse_path from setup_path_info().  But, now that we've
> > -      * used it and have no other references to these strings, it is time
> > -      * to deallocate them.
> > -      */
> > -     free_strmap_strings(&opti->paths);
> > -     strmap_func(&opti->paths, 1);
> > +     if (opti->pool)
> > +             strmap_func(&opti->paths, 0);
>
> This isn't new in your patch here, but I did scratch my head a bit over
> what "strmap_func" is. It's a bit less confusing if you read the whole
> function (as opposed to a diff), since then you're more likely to see
> the definition. But something like "strmap_clear_func()" would have been
> a lot less confusing.

Makes sense.

> Arguably, the existence of these function indirections is perhaps a sign
> that the strmap API should provide a version of the clear functions that
> takes "partial / not-partial" as a parameter.

Are you suggesting a modification of str{map,intmap,set}_clear() to
take an extra parameter, or removing the
str{map,intmap,set}_partial_clear() functions and introducing new
functions that take a partial/not-partial parameter?  I think you're
suggesting the latter, and that makes more sense to me...but I'm
drawing blanks trying to come up with a reasonable function name.

(If it helps for context -- the only current callers of the
*_partial_clear() functions are found in diffcore-rename.c and
merge-ort.c, so it'd be a pretty easy change to make to those.  There
are additionally some callers of strmap_clear() and strset_clear() in
builtin/shortlog.c and rerere.c, and it'd be nice to avoid exposing
those to the complexity of the partial clearing.)

> (Again, not really part of this patch series, but I hadn't looked at
> some of the earlier optimization steps).

Yeah, but this is the kind of reason I wanted you to review this
series, because I figured you might have more good comments on the
str{map,intmap,set} API calls.  :-)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29 16:23     ` Jeff King
@ 2021-07-29 19:46       ` Junio C Hamano
  2021-07-29 20:48         ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-07-29 19:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, git, Eric Sunshine, Elijah Newren,
	Derrick Stolee

Jeff King <peff@peff.net> writes:

> On Thu, Jul 29, 2021 at 12:20:13PM -0400, Jeff King wrote:
>
>> I assume your tests are just done using the regular glibc allocator. I
>> also wondered how plugging in a better allocator might fare. Here are
>> timings I did of your mega-renames case with three binaries: one built
>> with USE_MEMORY_POOL set to 0, one with it set to 1, and one with it set
>> to 0 but adding "-ltcmalloc" to EXTLIBS via config.mak.
>
> Oh, btw, I wasn't able to apply your series from the list on top of
> en/ort-perf-batch-14 (there are some problems in patch 4, and "am -3"
> says my clone of git.git is missing some of the pre-image sha1s). I
> fetched ort-perf-batch-15 from https://github.com/newren/git and timed
> that, which I imagine is the same. But you may need to tweak the patches
> so that Junio can pick them up.

Thanks, but the batch #15 has been in 'seen' since 23rd ;-)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-29 18:37       ` Elijah Newren
@ 2021-07-29 20:09         ` Jeff King
  2021-07-30  2:30           ` Elijah Newren
  2021-07-30 13:30           ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 65+ messages in thread
From: Jeff King @ 2021-07-29 20:09 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee

On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:

> > Arguably, the existence of these function indirections is perhaps a sign
> > that the strmap API should provide a version of the clear functions that
> > takes "partial / not-partial" as a parameter.
> 
> Are you suggesting a modification of str{map,intmap,set}_clear() to
> take an extra parameter, or removing the
> str{map,intmap,set}_partial_clear() functions and introducing new
> functions that take a partial/not-partial parameter?  I think you're
> suggesting the latter, and that makes more sense to me...but I'm
> drawing blanks trying to come up with a reasonable function name.

It does seem a shame to add the "partial" parameter to strmap_clear(),
just because most callers don't need it (so they end up with this
inscrutable "0" parameter).

What if there was a flags field? Then it could be combined with the
free_values parameter. The result is kind of verbose in two ways:

 - now strset_clear(), etc, need a "flags" parameter, which they didn't
   before (and is just "0" most of the time!)

 - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
   That's a lot longer, though arguably it's easier to understand since
   the boolean is explained.

Having gone through the exercise, I am not sure it is actually making
anything more readable (messy patch is below for reference).

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 3e7ab1ca82..dfbdba53da 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -242,7 +242,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
 
-	strset_clear(&dups);
+	strset_clear(&dups, 0);
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 7e6b3e1b14..0c960111d1 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -665,9 +665,10 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count)
 
 	strmap_for_each_entry(dir_rename_count, &iter, entry) {
 		struct strintmap *counts = entry->value;
-		strintmap_clear(counts);
+		strintmap_clear(counts, 0);
 	}
-	strmap_partial_clear(dir_rename_count, 1);
+	strmap_clear(dir_rename_count,
+		     STRMAP_FREE_VALUES | STRMAP_PARTIAL_CLEAR);
 }
 
 static void cleanup_dir_rename_info(struct dir_rename_info *info,
@@ -683,15 +684,15 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
 		return;
 
 	/* idx_map */
-	strintmap_clear(&info->idx_map);
+	strintmap_clear(&info->idx_map, 0);
 
 	/* dir_rename_guess */
-	strmap_clear(&info->dir_rename_guess, 1);
+	strmap_clear(&info->dir_rename_guess, STRMAP_FREE_VALUES);
 
 	/* relevant_source_dirs */
 	if (info->relevant_source_dirs &&
 	    info->relevant_source_dirs != dirs_removed) {
-		strintmap_clear(info->relevant_source_dirs);
+		strintmap_clear(info->relevant_source_dirs, 0);
 		FREE_AND_NULL(info->relevant_source_dirs);
 	}
 
@@ -716,7 +717,7 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
 
 		if (!strintmap_get(dirs_removed, source_dir)) {
 			string_list_append(&to_remove, source_dir);
-			strintmap_clear(counts);
+			strintmap_clear(counts, 0);
 			continue;
 		}
 
@@ -1045,8 +1046,8 @@ static int find_basename_matches(struct diff_options *options,
 		}
 	}
 
-	strintmap_clear(&sources);
-	strintmap_clear(&dests);
+	strintmap_clear(&sources, 0);
+	strintmap_clear(&dests, 0);
 
 	return renames;
 }
@@ -1700,7 +1701,7 @@ void diffcore_rename_extended(struct diff_options *options,
 	FREE_AND_NULL(rename_src);
 	rename_src_nr = rename_src_alloc = 0;
 	if (break_idx) {
-		strintmap_clear(break_idx);
+		strintmap_clear(break_idx, 0);
 		FREE_AND_NULL(break_idx);
 	}
 	trace2_region_leave("diff", "write back to queue", options->repo);
diff --git a/merge-ort.c b/merge-ort.c
index 0fb942692a..0765e23577 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -532,15 +532,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 {
 	struct rename_info *renames = &opti->renames;
 	int i;
-	void (*strmap_func)(struct strmap *, int) =
-		reinitialize ? strmap_partial_clear : strmap_clear;
-	void (*strintmap_func)(struct strintmap *) =
-		reinitialize ? strintmap_partial_clear : strintmap_clear;
-	void (*strset_func)(struct strset *) =
-		reinitialize ? strset_partial_clear : strset_clear;
+	unsigned flags = reinitialize ? STRMAP_PARTIAL_CLEAR : 0;
 
 	if (opti->pool)
-		strmap_func(&opti->paths, 0);
+		strmap_clear(&opti->paths, flags);
 	else {
 		/*
 		 * We marked opti->paths with strdup_strings = 0, so that
@@ -550,15 +545,15 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		 * to these strings, it is time to deallocate them.
 		 */
 		free_strmap_strings(&opti->paths);
-		strmap_func(&opti->paths, 1);
+		strmap_clear(&opti->paths, flags | STRMAP_FREE_VALUES);
 	}
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
 	 * opti->paths.  We don't want to deallocate anything twice, so we
 	 * don't free the keys and we pass 0 for free_values.
 	 */
-	strmap_func(&opti->conflicted, 0);
+	strmap_clear(&opti->conflicted, flags);
 
 	if (!opti->pool) {
 		/*
@@ -579,24 +574,24 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 
 	/* Free memory used by various renames maps */
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->dirs_removed[i]);
-		strmap_func(&renames->dir_renames[i], 0);
-		strintmap_func(&renames->relevant_sources[i]);
+		strintmap_clear(&renames->dirs_removed[i], flags);
+		strmap_clear(&renames->dir_renames[i], flags);
+		strintmap_clear(&renames->relevant_sources[i], flags);
 		if (!reinitialize)
 			assert(renames->cached_pairs_valid_side == 0);
 		if (i != renames->cached_pairs_valid_side &&
 		    -1 != renames->cached_pairs_valid_side) {
-			strset_func(&renames->cached_target_names[i]);
-			strmap_func(&renames->cached_pairs[i], 1);
-			strset_func(&renames->cached_irrelevant[i]);
+			strset_clear(&renames->cached_target_names[i], flags);
+			strmap_clear(&renames->cached_pairs[i], flags | STRMAP_FREE_VALUES);
+			strset_clear(&renames->cached_irrelevant[i], flags);
 			partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
 			if (!reinitialize)
 				strmap_clear(&renames->dir_rename_count[i], 1);
 		}
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->deferred[i].possible_trivial_merges);
-		strset_func(&renames->deferred[i].target_dirs);
+		strintmap_clear(&renames->deferred[i].possible_trivial_merges, flags);
+		strset_clear(&renames->deferred[i].target_dirs, flags);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 	renames->cached_pairs_valid_side = 0;
@@ -1482,7 +1477,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 			if (ret < 0)
 				return ret;
 		}
-		strintmap_clear(&copy);
+		strintmap_clear(&copy, 0);
 		strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges,
 					 &iter, entry) {
 			const char *path = entry->key;
diff --git a/strmap.c b/strmap.c
index 4fb9f6100e..7343800df5 100644
--- a/strmap.c
+++ b/strmap.c
@@ -37,10 +37,11 @@ void strmap_init_with_options(struct strmap *map,
 	map->strdup_strings = strdup_strings;
 }
 
-static void strmap_free_entries_(struct strmap *map, int free_values)
+static void strmap_free_entries_(struct strmap *map, unsigned flags)
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *e;
+	int free_values = flags & STRMAP_FREE_VALUES;
 
 	if (!map)
 		return;
@@ -64,16 +65,13 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	}
 }
 
-void strmap_clear(struct strmap *map, int free_values)
+void strmap_clear(struct strmap *map, unsigned flags)
 {
-	strmap_free_entries_(map, free_values);
-	hashmap_clear(&map->map);
-}
-
-void strmap_partial_clear(struct strmap *map, int free_values)
-{
-	strmap_free_entries_(map, free_values);
-	hashmap_partial_clear(&map->map);
+	strmap_free_entries_(map, flags);
+	if (flags & STRMAP_PARTIAL_CLEAR)
+		hashmap_partial_clear(&map->map);
+	else
+		hashmap_clear(&map->map);
 }
 
 static struct strmap_entry *create_entry(struct strmap *map,
diff --git a/strmap.h b/strmap.h
index 1e152d832d..d03d451654 100644
--- a/strmap.h
+++ b/strmap.h
@@ -46,16 +46,14 @@ void strmap_init_with_options(struct strmap *map,
 			      struct mem_pool *pool,
 			      int strdup_strings);
 
-/*
- * Remove all entries from the map, releasing any allocated resources.
- */
-void strmap_clear(struct strmap *map, int free_values);
+#define STRMAP_FREE_VALUES 1 /* 1 for historical compat, but we should probably
+				update callers to use the correct name) */
+#define STRMAP_PARTIAL_CLEAR 2
 
 /*
- * Similar to strmap_clear() but leaves map->map->table allocated and
- * pre-sized so that subsequent uses won't need as many rehashings.
+ * Remove all entries from the map, releasing any allocated resources.
  */
-void strmap_partial_clear(struct strmap *map, int free_values);
+void strmap_clear(struct strmap *map, unsigned flags);
 
 /*
  * Insert "str" into the map, pointing to "data".
@@ -148,14 +146,10 @@ static inline void strintmap_init_with_options(struct strintmap *map,
 	map->default_value = default_value;
 }
 
-static inline void strintmap_clear(struct strintmap *map)
-{
-	strmap_clear(&map->map, 0);
-}
-
-static inline void strintmap_partial_clear(struct strintmap *map)
+static inline void strintmap_clear(struct strintmap *map, unsigned flags)
 {
-	strmap_partial_clear(&map->map, 0);
+	/* maybe clear STRMAP_FREE_VALUES bit for extra protection */
+	strmap_clear(&map->map, flags);
 }
 
 static inline int strintmap_contains(struct strintmap *map, const char *str)
@@ -232,14 +226,9 @@ static inline void strset_init_with_options(struct strset *set,
 	strmap_init_with_options(&set->map, pool, strdup_strings);
 }
 
-static inline void strset_clear(struct strset *set)
-{
-	strmap_clear(&set->map, 0);
-}
-
-static inline void strset_partial_clear(struct strset *set)
+static inline void strset_clear(struct strset *set, unsigned flags)
 {
-	strmap_partial_clear(&set->map, 0);
+	strmap_clear(&set->map, flags);
 }
 
 static inline int strset_contains(struct strset *set, const char *str)

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29 16:20   ` Jeff King
  2021-07-29 16:23     ` Jeff King
@ 2021-07-29 20:46     ` Elijah Newren
  2021-07-29 21:14       ` Jeff King
  1 sibling, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-07-29 20:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee

On Thu, Jul 29, 2021 at 10:20 AM Jeff King <peff@peff.net> wrote:
>
> On Thu, Jul 29, 2021 at 03:58:34AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > This series is more about strmaps & memory pools than merge logic. CC'ing
> > Peff since he reviewed the strmap work[1], and that work included a number
> > of decisions that specifically had this series in mind.
>
> I haven't been following the other optimization threads very closely,
> but I'll try to give my general impressions.
>
> > === Basic Optimization idea ===
> >
> > In this series, I make use of memory pools to get faster allocations and
> > deallocations for many data structures that tend to all be deallocated at
> > the same time anyway.
> >
> > === Results ===
> >
> > For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> > performance work; instrument with trace2_region_* calls", 2020-10-28), the
> > changes in just this series improves the performance as follows:
> >
> >                      Before Series           After Series
> > no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
> > mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
> > just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms
>
> Pretty good results for the mega-renames case. I do wonder how much this
> matters in practice. That case is intentionally stressing the system,
> though I guess it's not too far-fetched (it's mostly a big directory
> rename). However, just "git checkout" across the rename already takes
> more than a second. So on the one hand, 400ms isn't nothing. On the
> other, I doubt anybody is likely to notice in the grand scheme of
> things.

The mega-renames case was spurred by looking at a repository at
$DAYJOB and trying to generate a similar testcase on a well-known open
source repository with approximately the same number of files.  The
linux kernel was handy for that.  Technically, to make it match the
right number of renames, I would have needed to rename more toplevel
directories, but it was close enough and I liked it being a simple
testcase.  So, to me, the testcase is more unusual in the number of
patches being rebased rather than in the number of renames, but I
chose a long sequence of patches (with lots of different types of
changes) because that served as a good correctness case and even ended
up being good for spurring searches for tweaks to existing
optimizations.  And I added the just-one-mega to see how a simple
cherry-pick would work with the large number of renames.  It too has a
good relative speedup.

You make fair points about the absolute timings for rebase, but the
"grand scheme of things" involves usecases outside of traditional
rebases.  Some of those involve far more merges, making the relative
timings more important.  They also involve much less overhead -- not
only do they get to ignore the "git checkout" present in most rebases,
but they also get to exclude the index or worktree updating that are
present above.  Without that extra overhead, the relative improvement
from this patch is even greater. One particular example usecase that
is coming soon is the introduction of `git log --remerge-diff`; it
makes the timing of individual merges critical since it does so many
of them.  And it becomes even more important for users who change the
default from --diff-merges=off to --diff-merges=remerge-diff, because
then any `git log -p` will potentially remerge hundreds or thousands
of merge commits.  (I've got lots of users who have -p imply
--remerge-diff since last November.)

> And we're paying a non-trivial cost in code complexity to do it (though
> I do think you've done an admirable job of making that cost as low as
> possible). Dropping the USE_MEMORY_POOL flag and just always using a
> pool would make a lot of that complexity go away. I understand how it
> makes leak hunting harder, but I think simpler code would probably be
> worth the tradeoff (and in a sense, there _aren't_ leaks in an
> always-pool world; we're holding on to all of the memory through the
> whole operation).

Yeah, I had to keep the USE_MEMORY_POOL flag as I was rebasing my
sequence of optimization series to make sure I kept the intermediate
steps clean while upstreaming all the work.  I ended up just leaving
(a simplified form of) it in when I was all done, but it has probably
already served its purpose.  I'd be fine with dropping it.

> I assume your tests are just done using the regular glibc allocator. I

Yes.

> also wondered how plugging in a better allocator might fare. Here are
> timings I did of your mega-renames case with three binaries: one built
> with USE_MEMORY_POOL set to 0, one with it set to 1, and one with it set
> to 0 but adding "-ltcmalloc" to EXTLIBS via config.mak.
>
>   $ hyperfine \
>       -p 'git checkout hwmon-updates &&
>           git reset --hard fd8bdb23b91876ac1e624337bb88dc1dcc21d67e &&
>           git checkout 5.4-renames^0' \
>       -L version nopool,pool,tcmalloc \
>       './test-tool.{version} fast-rebase --onto HEAD base hwmon-updates'

Ooh, I didn't know about -L.

>   Benchmark #1: ./test-tool.nopool fast-rebase --onto HEAD base hwmon-updates
>     Time (mean ± σ):     921.1 ms ± 146.0 ms    [User: 843.0 ms, System: 77.5 ms]
>     Range (min … max):   660.9 ms … 1112.2 ms    10 runs
>
>   Benchmark #2: ./test-tool.pool fast-rebase --onto HEAD base hwmon-updates
>     Time (mean ± σ):     635.4 ms ± 125.5 ms    [User: 563.7 ms, System: 71.3 ms]
>     Range (min … max):   496.8 ms … 856.7 ms    10 runs
>
>   Benchmark #3: ./test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates
>     Time (mean ± σ):     727.3 ms ± 139.9 ms    [User: 654.1 ms, System: 72.9 ms]
>     Range (min … max):   476.3 ms … 900.5 ms    10 runs

That's some _really_ wide variance on your runs, making me wonder if
you are running other things on your (I presume) laptop that are
potentially muddying the numbers.  Would the tcmalloc case actually
have the fastest run in general, or was it just lucky to hit a "quiet"
moment on the laptop?

Or perhaps my pre-warming script helps reduce variance more than I thought...

>   Summary
>     './test-tool.pool fast-rebase --onto HEAD base hwmon-updates' ran
>       1.14 ± 0.32 times faster than './test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates'
>       1.45 ± 0.37 times faster than './test-tool.nopool fast-rebase --onto HEAD base hwmon-updates'
>
> The pool allocator does come out ahead when comparing means, but the
> improvement is within the noise (and the fastest run was actually with
> tcmalloc).
>
> I was also curious about peak heap usage. According to massif, the pool
> version peaks at ~800k extra (out of 82MB), which is negligible. Plus it
> has fewer overall allocations, so it seems to actually save 4-5MB in
> malloc overhead (though I would imagine that varies between allocators;
> I'm just going from massif numbers here).

I did similar testing a year ago, before I even looked at memory
pools.  I was surprised by how big a speedup I saw, and considered
asking on the list if we could push to use a different allocator by
default.  Ultimately, I figured that probably wouldn't fly and
distributors might override our choices anyway.  It was at that point
that I decided to start tweaking mem-pool.[ch] (which ended up getting
merged at edab8a8d07 ("Merge branch 'en/mem-pool'", 2020-08-27)), and
then integrating that into strmap/strset/strintmap -- all in an effort
to guarantee that we realized the speedups that I knew were possible
due to my testing with the special allocators.

> So...I dunno. It's hard to assess the real-world impact of the speedup,
> compared to the complexity cost. Ultimately, this is changing code that
> you wrote and will probably maintain. So I'd leave the final decision
> for that tradeoff to you. I'm just injecting some thoughts and numbers. :)

It's nice to see others duplicate the results, and I appreciate the
sanity check on overall usefulness.  If it were just optimizing
rebase, I probably could have quit long ago when I had 15s rebases of
35 patches.  It was part stubbornness on my part wondering why it
couldn't be sub-second, and part knowing of other usecases that I
wanted to make attractive to others.  I want --remerge-diff to be
useful *and* practical.  I want rebasing/cherry-picking un-checked-out
branches to be useful and practical.  And I'd like to entice server
operators (GitHub, GitLab, etc.) to use the same merge machinery used
by end users so that reported merge-ability matches what end users see
when they try it themselves.

I think you make a good suggestion to drop the USE_MEMORY_POOL switch.
I think I'll do it as an additional patch at the end of the series,
just so it's easy for me to restore if by change it's ever needed.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29 19:46       ` Junio C Hamano
@ 2021-07-29 20:48         ` Junio C Hamano
  2021-07-29 21:05           ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-07-29 20:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, git, Eric Sunshine, Elijah Newren,
	Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> On Thu, Jul 29, 2021 at 12:20:13PM -0400, Jeff King wrote:
>>
>>> I assume your tests are just done using the regular glibc allocator. I
>>> also wondered how plugging in a better allocator might fare. Here are
>>> timings I did of your mega-renames case with three binaries: one built
>>> with USE_MEMORY_POOL set to 0, one with it set to 1, and one with it set
>>> to 0 but adding "-ltcmalloc" to EXTLIBS via config.mak.
>>
>> Oh, btw, I wasn't able to apply your series from the list on top of
>> en/ort-perf-batch-14 (there are some problems in patch 4, and "am -3"
>> says my clone of git.git is missing some of the pre-image sha1s). I
>> fetched ort-perf-batch-15 from https://github.com/newren/git and timed
>> that, which I imagine is the same. But you may need to tweak the patches
>> so that Junio can pick them up.
>
> Thanks, but the batch #15 has been in 'seen' since 23rd ;-)

Oh, that is the previous round.  I haven't had the chance to pick up
this new round.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29 20:48         ` Junio C Hamano
@ 2021-07-29 21:05           ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-07-29 21:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List,
	Eric Sunshine, Derrick Stolee

On Thu, Jul 29, 2021 at 2:48 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Jeff King <peff@peff.net> writes:
> >
> >> On Thu, Jul 29, 2021 at 12:20:13PM -0400, Jeff King wrote:
> >>
> >>> I assume your tests are just done using the regular glibc allocator. I
> >>> also wondered how plugging in a better allocator might fare. Here are
> >>> timings I did of your mega-renames case with three binaries: one built
> >>> with USE_MEMORY_POOL set to 0, one with it set to 1, and one with it set
> >>> to 0 but adding "-ltcmalloc" to EXTLIBS via config.mak.
> >>
> >> Oh, btw, I wasn't able to apply your series from the list on top of
> >> en/ort-perf-batch-14 (there are some problems in patch 4, and "am -3"
> >> says my clone of git.git is missing some of the pre-image sha1s). I
> >> fetched ort-perf-batch-15 from https://github.com/newren/git and timed
> >> that, which I imagine is the same. But you may need to tweak the patches
> >> so that Junio can pick them up.
> >
> > Thanks, but the batch #15 has been in 'seen' since 23rd ;-)
>
> Oh, that is the previous round.  I haven't had the chance to pick up
> this new round.

Oh, interesting.  At some point I had noticed that Junio based
en/ort-perf-batch-14 on top of a version of master that did not
include en/ort-perf-batch-12.  While that was fine from a correctness
point of view, it made my claims of speedups a bit weird and difficult
for others to reproduce as they'd need to merge some other series
first.  It looks like the base for en/ort-perf-batch-14 at some point
advanced to include a version of master that included
en/ort-perf-batch-12.

Anyway, I'm happy to re-roll this series and base it on
en/ort-perf-batch-14.  Peff has a few suggested improvements for me to
include in that re-roll.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/7] Final optimization batch (#15): use memory pools
  2021-07-29 20:46     ` Elijah Newren
@ 2021-07-29 21:14       ` Jeff King
  0 siblings, 0 replies; 65+ messages in thread
From: Jeff King @ 2021-07-29 21:14 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee

On Thu, Jul 29, 2021 at 02:46:03PM -0600, Elijah Newren wrote:

> You make fair points about the absolute timings for rebase, but the
> "grand scheme of things" involves usecases outside of traditional
> rebases.  Some of those involve far more merges, making the relative
> timings more important.  They also involve much less overhead -- not
> only do they get to ignore the "git checkout" present in most rebases,
> but they also get to exclude the index or worktree updating that are
> present above.  Without that extra overhead, the relative improvement
> from this patch is even greater. One particular example usecase that
> is coming soon is the introduction of `git log --remerge-diff`; it
> makes the timing of individual merges critical since it does so many
> of them.  And it becomes even more important for users who change the
> default from --diff-merges=off to --diff-merges=remerge-diff, because
> then any `git log -p` will potentially remerge hundreds or thousands
> of merge commits.  (I've got lots of users who have -p imply
> --remerge-diff since last November.)

Ooh, I hadn't considered doing a bunch of fast in-memory merges for
--remerge-diff. That is a very compelling use case, I agree.

> >   Benchmark #1: ./test-tool.nopool fast-rebase --onto HEAD base hwmon-updates
> >     Time (mean ± σ):     921.1 ms ± 146.0 ms    [User: 843.0 ms, System: 77.5 ms]
> >     Range (min … max):   660.9 ms … 1112.2 ms    10 runs
> >
> >   Benchmark #2: ./test-tool.pool fast-rebase --onto HEAD base hwmon-updates
> >     Time (mean ± σ):     635.4 ms ± 125.5 ms    [User: 563.7 ms, System: 71.3 ms]
> >     Range (min … max):   496.8 ms … 856.7 ms    10 runs
> >
> >   Benchmark #3: ./test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates
> >     Time (mean ± σ):     727.3 ms ± 139.9 ms    [User: 654.1 ms, System: 72.9 ms]
> >     Range (min … max):   476.3 ms … 900.5 ms    10 runs
> 
> That's some _really_ wide variance on your runs, making me wonder if
> you are running other things on your (I presume) laptop that are
> potentially muddying the numbers.  Would the tcmalloc case actually
> have the fastest run in general, or was it just lucky to hit a "quiet"
> moment on the laptop?

Yeah, I noticed that, too. The system was otherwise unloaded, but there
I think a big part of it was that my prepare commands flipped back and forth
between the pre-/post-rename states twice. Even though that isn't
included in the timings, I think it was just creating a lot of delayed
work for the OS to do when it decided to write back all those inodes.

Switching it to:

  hyperfine \
    -p 'git checkout 5.4-renames^0 &&
        git branch -f hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e' \
    -L version nopool,pool,tcmalloc \
    './test-tool.{version} fast-rebase --onto HEAD base hwmon-updates'

produces much smoother results:

  Benchmark #1: ./test-tool.nopool fast-rebase --onto HEAD base hwmon-updates
    Time (mean ± σ):     649.7 ms ±   5.0 ms    [User: 595.1 ms, System: 54.3 ms]
    Range (min … max):   643.7 ms … 661.9 ms    10 runs
   
  Benchmark #2: ./test-tool.pool fast-rebase --onto HEAD base hwmon-updates
    Time (mean ± σ):     405.0 ms ±   3.0 ms    [User: 354.9 ms, System: 50.0 ms]
    Range (min … max):   401.0 ms … 411.9 ms    10 runs
   
  Benchmark #3: ./test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates
    Time (mean ± σ):     476.7 ms ±   3.9 ms    [User: 430.1 ms, System: 46.4 ms]
    Range (min … max):   472.3 ms … 484.1 ms    10 runs
   
  Summary
    './test-tool.pool fast-rebase --onto HEAD base hwmon-updates' ran
      1.18 ± 0.01 times faster than './test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates'
      1.60 ± 0.02 times faster than './test-tool.nopool fast-rebase --onto HEAD base hwmon-updates'

So the pool is definitively faster, though we can go a fair ways by
using a better allocator. :)

> I did similar testing a year ago, before I even looked at memory
> pools.  I was surprised by how big a speedup I saw, and considered
> asking on the list if we could push to use a different allocator by
> default.  Ultimately, I figured that probably wouldn't fly and
> distributors might override our choices anyway.  It was at that point
> that I decided to start tweaking mem-pool.[ch] (which ended up getting
> merged at edab8a8d07 ("Merge branch 'en/mem-pool'", 2020-08-27)), and
> then integrating that into strmap/strset/strintmap -- all in an effort
> to guarantee that we realized the speedups that I knew were possible
> due to my testing with the special allocators.

Yeah, I think choice of allocator should be outside the scope of Git. It
would be a packaging issue if people want to squeeze out every bit. I do
agree there's something to be said for Git just handling this itself,
regardless of the platform or build. I'm just always on the lookout for
places where we can get free speedups without having to pay any
maintenance cost for them. :)

> I think you make a good suggestion to drop the USE_MEMORY_POOL switch.
> I think I'll do it as an additional patch at the end of the series,
> just so it's easy for me to restore if by change it's ever needed.

Yeah, between that cleanup reducing the maintenance cost and your
compelling use cases above, I think I'm convinced that this is a good
direction.

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-29 15:26       ` Jeff King
@ 2021-07-30  2:27         ` Elijah Newren
  2021-07-30 16:12           ` Jeff King
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-07-30  2:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Jul 29, 2021 at 9:26 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 28, 2021 at 04:49:18PM -0600, Elijah Newren wrote:
>
> > On Mon, Jul 26, 2021 at 8:36 AM Derrick Stolee <stolee@gmail.com> wrote:
> > >
> > > On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
> > > > From: Elijah Newren <newren@gmail.com>
> > > >
> > > > We need functions which will either call
> > > >     xmalloc, xcalloc, xstrndup
> > > > or
> > > >     mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
> > > > depending on whether we have a non-NULL memory pool.  Add these
> > > > functions; the next commit will make use of these.
> > >
> > > I briefly considered that this should just be the way the
> > > mem_pool_* methods work. It does rely on the caller knowing
> > > to free() the allocated memory when their pool is NULL, so
> > > perhaps such a universal change might be too much. What do
> > > you think?
> >
> > That's interesting, but I'm worried it might be a bit much.  Do others
> > on the list have an opinion here?
>
> FWIW, I had the same thought. You can also provide a helper to make the
> freeing side nicer:
>
>   static void mem_pool_free(struct mem_pool *m, void *ptr)
>   {
>         if (m)
>                 return; /* will be freed when pool frees */
>         free(ptr);
>   }
>
> We do something similar with unuse_commit_buffer(), where the caller
> isn't aware of we pulled the buffer from cache or allocated it
> especially for them.

Having a paired function may help one side, but I worry that the name
(mem_pool_free) might introduce some confusion of its own -- "Why is
there a mem_pool_free() function, isn't the point of memory pools to
not need to individually free things?"  Or, "Why are they freeing the
pool here and what's the extra parameter?"

I'm not sure I see the right way to address that, so I think I'm going
to leave this part out of my series and let someone else add such
changes on top if they feel motivated to do so.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-29 20:09         ` Jeff King
@ 2021-07-30  2:30           ` Elijah Newren
  2021-07-30 16:12             ` Jeff King
  2021-07-30 13:30           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-07-30  2:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee

On Thu, Jul 29, 2021 at 2:09 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
>
> > > Arguably, the existence of these function indirections is perhaps a sign
> > > that the strmap API should provide a version of the clear functions that
> > > takes "partial / not-partial" as a parameter.
> >
> > Are you suggesting a modification of str{map,intmap,set}_clear() to
> > take an extra parameter, or removing the
> > str{map,intmap,set}_partial_clear() functions and introducing new
> > functions that take a partial/not-partial parameter?  I think you're
> > suggesting the latter, and that makes more sense to me...but I'm
> > drawing blanks trying to come up with a reasonable function name.
>
> It does seem a shame to add the "partial" parameter to strmap_clear(),
> just because most callers don't need it (so they end up with this
> inscrutable "0" parameter).
>
> What if there was a flags field? Then it could be combined with the
> free_values parameter. The result is kind of verbose in two ways:
>
>  - now strset_clear(), etc, need a "flags" parameter, which they didn't
>    before (and is just "0" most of the time!)
>
>  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
>    That's a lot longer, though arguably it's easier to understand since
>    the boolean is explained.
>
> Having gone through the exercise, I am not sure it is actually making
> anything more readable (messy patch is below for reference).

Thanks for diving in.  Since it's not clear if it's helping, I'll just
take your earlier suggestion to rename the "strmap_func" variable to
"strmap_clear_func" instead.

>
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 3e7ab1ca82..dfbdba53da 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -242,7 +242,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>                 insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
>         }
>
> -       strset_clear(&dups);
> +       strset_clear(&dups, 0);
>         strbuf_release(&ident);
>         strbuf_release(&oneline);
>  }
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 7e6b3e1b14..0c960111d1 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -665,9 +665,10 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count)
>
>         strmap_for_each_entry(dir_rename_count, &iter, entry) {
>                 struct strintmap *counts = entry->value;
> -               strintmap_clear(counts);
> +               strintmap_clear(counts, 0);
>         }
> -       strmap_partial_clear(dir_rename_count, 1);
> +       strmap_clear(dir_rename_count,
> +                    STRMAP_FREE_VALUES | STRMAP_PARTIAL_CLEAR);
>  }
>
>  static void cleanup_dir_rename_info(struct dir_rename_info *info,
> @@ -683,15 +684,15 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
>                 return;
>
>         /* idx_map */
> -       strintmap_clear(&info->idx_map);
> +       strintmap_clear(&info->idx_map, 0);
>
>         /* dir_rename_guess */
> -       strmap_clear(&info->dir_rename_guess, 1);
> +       strmap_clear(&info->dir_rename_guess, STRMAP_FREE_VALUES);
>
>         /* relevant_source_dirs */
>         if (info->relevant_source_dirs &&
>             info->relevant_source_dirs != dirs_removed) {
> -               strintmap_clear(info->relevant_source_dirs);
> +               strintmap_clear(info->relevant_source_dirs, 0);
>                 FREE_AND_NULL(info->relevant_source_dirs);
>         }
>
> @@ -716,7 +717,7 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
>
>                 if (!strintmap_get(dirs_removed, source_dir)) {
>                         string_list_append(&to_remove, source_dir);
> -                       strintmap_clear(counts);
> +                       strintmap_clear(counts, 0);
>                         continue;
>                 }
>
> @@ -1045,8 +1046,8 @@ static int find_basename_matches(struct diff_options *options,
>                 }
>         }
>
> -       strintmap_clear(&sources);
> -       strintmap_clear(&dests);
> +       strintmap_clear(&sources, 0);
> +       strintmap_clear(&dests, 0);
>
>         return renames;
>  }
> @@ -1700,7 +1701,7 @@ void diffcore_rename_extended(struct diff_options *options,
>         FREE_AND_NULL(rename_src);
>         rename_src_nr = rename_src_alloc = 0;
>         if (break_idx) {
> -               strintmap_clear(break_idx);
> +               strintmap_clear(break_idx, 0);
>                 FREE_AND_NULL(break_idx);
>         }
>         trace2_region_leave("diff", "write back to queue", options->repo);
> diff --git a/merge-ort.c b/merge-ort.c
> index 0fb942692a..0765e23577 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -532,15 +532,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>  {
>         struct rename_info *renames = &opti->renames;
>         int i;
> -       void (*strmap_func)(struct strmap *, int) =
> -               reinitialize ? strmap_partial_clear : strmap_clear;
> -       void (*strintmap_func)(struct strintmap *) =
> -               reinitialize ? strintmap_partial_clear : strintmap_clear;
> -       void (*strset_func)(struct strset *) =
> -               reinitialize ? strset_partial_clear : strset_clear;
> +       unsigned flags = reinitialize ? STRMAP_PARTIAL_CLEAR : 0;
>
>         if (opti->pool)
> -               strmap_func(&opti->paths, 0);
> +               strmap_clear(&opti->paths, flags);
>         else {
>                 /*
>                  * We marked opti->paths with strdup_strings = 0, so that
> @@ -550,15 +545,15 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>                  * to these strings, it is time to deallocate them.
>                  */
>                 free_strmap_strings(&opti->paths);
> -               strmap_func(&opti->paths, 1);
> +               strmap_clear(&opti->paths, flags | STRMAP_FREE_VALUES);
>         }
>
>         /*
>          * All keys and values in opti->conflicted are a subset of those in
>          * opti->paths.  We don't want to deallocate anything twice, so we
>          * don't free the keys and we pass 0 for free_values.
>          */
> -       strmap_func(&opti->conflicted, 0);
> +       strmap_clear(&opti->conflicted, flags);
>
>         if (!opti->pool) {
>                 /*
> @@ -579,24 +574,24 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>
>         /* Free memory used by various renames maps */
>         for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
> -               strintmap_func(&renames->dirs_removed[i]);
> -               strmap_func(&renames->dir_renames[i], 0);
> -               strintmap_func(&renames->relevant_sources[i]);
> +               strintmap_clear(&renames->dirs_removed[i], flags);
> +               strmap_clear(&renames->dir_renames[i], flags);
> +               strintmap_clear(&renames->relevant_sources[i], flags);
>                 if (!reinitialize)
>                         assert(renames->cached_pairs_valid_side == 0);
>                 if (i != renames->cached_pairs_valid_side &&
>                     -1 != renames->cached_pairs_valid_side) {
> -                       strset_func(&renames->cached_target_names[i]);
> -                       strmap_func(&renames->cached_pairs[i], 1);
> -                       strset_func(&renames->cached_irrelevant[i]);
> +                       strset_clear(&renames->cached_target_names[i], flags);
> +                       strmap_clear(&renames->cached_pairs[i], flags | STRMAP_FREE_VALUES);
> +                       strset_clear(&renames->cached_irrelevant[i], flags);
>                         partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
>                         if (!reinitialize)
>                                 strmap_clear(&renames->dir_rename_count[i], 1);
>                 }
>         }
>         for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
> -               strintmap_func(&renames->deferred[i].possible_trivial_merges);
> -               strset_func(&renames->deferred[i].target_dirs);
> +               strintmap_clear(&renames->deferred[i].possible_trivial_merges, flags);
> +               strset_clear(&renames->deferred[i].target_dirs, flags);
>                 renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
>         }
>         renames->cached_pairs_valid_side = 0;
> @@ -1482,7 +1477,7 @@ static int handle_deferred_entries(struct merge_options *opt,
>                         if (ret < 0)
>                                 return ret;
>                 }
> -               strintmap_clear(&copy);
> +               strintmap_clear(&copy, 0);
>                 strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges,
>                                          &iter, entry) {
>                         const char *path = entry->key;
> diff --git a/strmap.c b/strmap.c
> index 4fb9f6100e..7343800df5 100644
> --- a/strmap.c
> +++ b/strmap.c
> @@ -37,10 +37,11 @@ void strmap_init_with_options(struct strmap *map,
>         map->strdup_strings = strdup_strings;
>  }
>
> -static void strmap_free_entries_(struct strmap *map, int free_values)
> +static void strmap_free_entries_(struct strmap *map, unsigned flags)
>  {
>         struct hashmap_iter iter;
>         struct strmap_entry *e;
> +       int free_values = flags & STRMAP_FREE_VALUES;
>
>         if (!map)
>                 return;
> @@ -64,16 +65,13 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
>         }
>  }
>
> -void strmap_clear(struct strmap *map, int free_values)
> +void strmap_clear(struct strmap *map, unsigned flags)
>  {
> -       strmap_free_entries_(map, free_values);
> -       hashmap_clear(&map->map);
> -}
> -
> -void strmap_partial_clear(struct strmap *map, int free_values)
> -{
> -       strmap_free_entries_(map, free_values);
> -       hashmap_partial_clear(&map->map);
> +       strmap_free_entries_(map, flags);
> +       if (flags & STRMAP_PARTIAL_CLEAR)
> +               hashmap_partial_clear(&map->map);
> +       else
> +               hashmap_clear(&map->map);
>  }
>
>  static struct strmap_entry *create_entry(struct strmap *map,
> diff --git a/strmap.h b/strmap.h
> index 1e152d832d..d03d451654 100644
> --- a/strmap.h
> +++ b/strmap.h
> @@ -46,16 +46,14 @@ void strmap_init_with_options(struct strmap *map,
>                               struct mem_pool *pool,
>                               int strdup_strings);
>
> -/*
> - * Remove all entries from the map, releasing any allocated resources.
> - */
> -void strmap_clear(struct strmap *map, int free_values);
> +#define STRMAP_FREE_VALUES 1 /* 1 for historical compat, but we should probably
> +                               update callers to use the correct name) */
> +#define STRMAP_PARTIAL_CLEAR 2
>
>  /*
> - * Similar to strmap_clear() but leaves map->map->table allocated and
> - * pre-sized so that subsequent uses won't need as many rehashings.
> + * Remove all entries from the map, releasing any allocated resources.
>   */
> -void strmap_partial_clear(struct strmap *map, int free_values);
> +void strmap_clear(struct strmap *map, unsigned flags);
>
>  /*
>   * Insert "str" into the map, pointing to "data".
> @@ -148,14 +146,10 @@ static inline void strintmap_init_with_options(struct strintmap *map,
>         map->default_value = default_value;
>  }
>
> -static inline void strintmap_clear(struct strintmap *map)
> -{
> -       strmap_clear(&map->map, 0);
> -}
> -
> -static inline void strintmap_partial_clear(struct strintmap *map)
> +static inline void strintmap_clear(struct strintmap *map, unsigned flags)
>  {
> -       strmap_partial_clear(&map->map, 0);
> +       /* maybe clear STRMAP_FREE_VALUES bit for extra protection */
> +       strmap_clear(&map->map, flags);
>  }
>
>  static inline int strintmap_contains(struct strintmap *map, const char *str)
> @@ -232,14 +226,9 @@ static inline void strset_init_with_options(struct strset *set,
>         strmap_init_with_options(&set->map, pool, strdup_strings);
>  }
>
> -static inline void strset_clear(struct strset *set)
> -{
> -       strmap_clear(&set->map, 0);
> -}
> -
> -static inline void strset_partial_clear(struct strset *set)
> +static inline void strset_clear(struct strset *set, unsigned flags)
>  {
> -       strmap_partial_clear(&set->map, 0);
> +       strmap_clear(&set->map, flags);
>  }
>
>  static inline int strset_contains(struct strset *set, const char *str)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3 0/9] Final optimization batch (#15): use memory pools
  2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-07-29 16:20   ` Jeff King
@ 2021-07-30 11:47   ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
                       ` (9 more replies)
  9 siblings, 10 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren

This series textually depends on en/ort-perf-batch-14, but the ideas are
orthogonal to it and orthogonal to previous series. It can be reviewed
independently.

Changes since v1, addressing Eric's feedback:

 * Fixed a comment that became out-of-date in patch 1
 * Swapped commits 2 and 3 so that one can better motivate the other.

Changes since v2, addressing Peff's feedback

 * Rebased on en/ort-perf-batch-14 (resolving a trivial conflict with the
   new string_list_init_nodup() usage)
 * Added a new preliminary patch renaming str*_func() to str*_clear_func()
 * Added a new final patch that hardcodes that we'll just use memory pools

=== Basic Optimization idea ===

In this series, I make use of memory pools to get faster allocations and
deallocations for many data structures that tend to all be deallocated at
the same time anyway.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


=== Overall Results across all optimization work ===

This is my final prepared optimization series. It might be worth reviewing
how my optimizations fared overall, comparing the original merge-recursive
timings with three things: how much merge-recursive improved (as a
side-effect of optimizing merge-ort), how much improvement we would have
gotten from a hypothetical infinite parallelization of rename detection, and
what I achieved at the end with merge-ort:

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
no-renames:       18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
no-renames:         1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565


And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3


=== Caveat ===

It may be worth noting, though, that my optimization numbers above for
merge-ort use test-tool fast-rebase. git rebase -s ort on the three
testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
respectively). At this point, any further optimization work should go into
making a faster full-featured rebase by copying the ideas from fast-rebase:
avoid unnecessary process forking, avoid updating the index and working copy
until either the rebase is finished or you hit a conflict (and don't write
rebase metadata to disk until that point either), get rid of the glacially
slow revision walking of the upstream side of history (nuke
can_fast_forward(), make --reapply-cherry-picks the default) or at least
don't revision walk so many times (multiple calls to get_merge_bases in
can_fast_forward() plus a is_linear_history() walk, checking for upstream
cherry-picks, probably more), turn off per-commit hooks that probably should
have never been on anyway, etc.

Elijah Newren (9):
  merge-ort: rename str{map,intmap,set}_func()
  diffcore-rename: use a mem_pool for exact rename detection's hashmap
  merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  merge-ort: set up a memory pool
  merge-ort: switch our strmaps over to using memory pools
  diffcore-rename, merge-ort: add wrapper functions for filepair
    alloc/dealloc
  merge-ort: store filepairs and filespecs in our mem_pool
  merge-ort: reuse path strings in pool_alloc_filespec
  merge-ort: remove compile-time ability to turn off usage of memory
    pools

 diffcore-rename.c |  68 +++++++++++--
 diffcore.h        |   3 +
 merge-ort.c       | 238 ++++++++++++++++++++++++++++++++--------------
 3 files changed, 231 insertions(+), 78 deletions(-)


base-commit: 8b09a900a1f1f00d4deb04f567994ae8f1804b5e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-990%2Fnewren%2Fort-perf-batch-15-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-990/newren/ort-perf-batch-15-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/990

Range-diff vs v2:

  -:  ----------- >  1:  e075d985f26 merge-ort: rename str{map,intmap,set}_func()
  1:  ea08b34d29b =  2:  8416afa89fb diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2:  fdfc2b93ba4 !  3:  2c0b90eaba5 merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
     @@ Metadata
       ## Commit message ##
          merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
      
     -    We need functions which will either call
     +    Make the code more flexible so that it can handle both being run with or
     +    without a memory pool by adding utility functions which will either call
              xmalloc, xcalloc, xstrndup
          or
              mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
     -    depending on whether we have a non-NULL memory pool.  Add these
     -    functions; a subsequent commit will make use of these.
     +    depending on whether we have a non-NULL memory pool.  A subsequent
     +    commit will make use of these.
     +
     +    (We will actually be dropping these functions soon and just assuming we
     +    always have a memory pool, but the flexibility was very useful during
     +    development of merge-ort so I want to be able to restore it if needed.)
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
  3:  c7150869107 =  4:  6646f6fd1ca merge-ort: set up a memory pool
  4:  dd8839b2843 !  5:  7c49aa601d0 merge-ort: switch our strmaps over to using memory pools
     @@ Commit message
      
       ## merge-ort.c ##
      @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
     - 	void (*strset_func)(struct strset *) =
     + 	void (*strset_clear_func)(struct strset *) =
       		reinitialize ? strset_partial_clear : strset_clear;
       
      -	/*
     @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_inte
      -	 * to deallocate them.
      -	 */
      -	free_strmap_strings(&opti->paths);
     --	strmap_func(&opti->paths, 1);
     +-	strmap_clear_func(&opti->paths, 1);
      +	if (opti->pool)
     -+		strmap_func(&opti->paths, 0);
     ++		strmap_clear_func(&opti->paths, 0);
      +	else {
      +		/*
      +		 * We marked opti->paths with strdup_strings = 0, so that
     @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_inte
      +		 * to these strings, it is time to deallocate them.
      +		 */
      +		free_strmap_strings(&opti->paths);
     -+		strmap_func(&opti->paths, 1);
     ++		strmap_clear_func(&opti->paths, 1);
      +	}
       
       	/*
       	 * All keys and values in opti->conflicted are a subset of those in
      @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
       	 */
     - 	strmap_func(&opti->conflicted, 0);
     + 	strmap_clear_func(&opti->conflicted, 0);
       
      -	/*
      -	 * opti->paths_to_free is similar to opti->paths; we created it with
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
       	 */
      -	strmap_init_with_options(&opt->priv->paths, NULL, 0);
      -	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
     --	string_list_init(&opt->priv->paths_to_free, 0);
     +-	string_list_init_nodup(&opt->priv->paths_to_free);
      +	strmap_init_with_options(&opt->priv->paths, pool, 0);
      +	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
      +	if (!opt->priv->pool)
     -+		string_list_init(&opt->priv->paths_to_free, 0);
     ++		string_list_init_nodup(&opt->priv->paths_to_free);
       
       	/*
       	 * keys & strbufs in output will sometimes need to outlive "paths",
  5:  560800a80ef =  6:  08cf2498f96 diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  6:  94d60c8a476 =  7:  4ffa5af8b57 merge-ort: store filepairs and filespecs in our mem_pool
  7:  fda885dabe6 =  8:  1556f0443c3 merge-ort: reuse path strings in pool_alloc_filespec
  -:  ----------- >  9:  de30dbac25e merge-ort: remove compile-time ability to turn off usage of memory pools

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3 1/9] merge-ort: rename str{map,intmap,set}_func()
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In order to make it clearer that these three variables holding a
function refer to functions that will clear the strmap/strintmap/strset,
rename them to str{map,intmap,set}_clear_func().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index e75b524153e..401a40247a3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -519,11 +519,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 {
 	struct rename_info *renames = &opti->renames;
 	int i;
-	void (*strmap_func)(struct strmap *, int) =
+	void (*strmap_clear_func)(struct strmap *, int) =
 		reinitialize ? strmap_partial_clear : strmap_clear;
-	void (*strintmap_func)(struct strintmap *) =
+	void (*strintmap_clear_func)(struct strintmap *) =
 		reinitialize ? strintmap_partial_clear : strintmap_clear;
-	void (*strset_func)(struct strset *) =
+	void (*strset_clear_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
 	/*
@@ -534,14 +534,14 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 * to deallocate them.
 	 */
 	free_strmap_strings(&opti->paths);
-	strmap_func(&opti->paths, 1);
+	strmap_clear_func(&opti->paths, 1);
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
 	 * opti->paths.  We don't want to deallocate anything twice, so we
 	 * don't free the keys and we pass 0 for free_values.
 	 */
-	strmap_func(&opti->conflicted, 0);
+	strmap_clear_func(&opti->conflicted, 0);
 
 	/*
 	 * opti->paths_to_free is similar to opti->paths; we created it with
@@ -559,24 +559,24 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 
 	/* Free memory used by various renames maps */
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->dirs_removed[i]);
-		strmap_func(&renames->dir_renames[i], 0);
-		strintmap_func(&renames->relevant_sources[i]);
+		strintmap_clear_func(&renames->dirs_removed[i]);
+		strmap_clear_func(&renames->dir_renames[i], 0);
+		strintmap_clear_func(&renames->relevant_sources[i]);
 		if (!reinitialize)
 			assert(renames->cached_pairs_valid_side == 0);
 		if (i != renames->cached_pairs_valid_side &&
 		    -1 != renames->cached_pairs_valid_side) {
-			strset_func(&renames->cached_target_names[i]);
-			strmap_func(&renames->cached_pairs[i], 1);
-			strset_func(&renames->cached_irrelevant[i]);
+			strset_clear_func(&renames->cached_target_names[i]);
+			strmap_clear_func(&renames->cached_pairs[i], 1);
+			strset_clear_func(&renames->cached_irrelevant[i]);
 			partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
 			if (!reinitialize)
 				strmap_clear(&renames->dir_rename_count[i], 1);
 		}
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->deferred[i].possible_trivial_merges);
-		strset_func(&renames->deferred[i].target_dirs);
+		strintmap_clear_func(&renames->deferred[i].possible_trivial_merges);
+		strset_clear_func(&renames->deferred[i].target_dirs);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 	renames->cached_pairs_valid_side = 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Exact rename detection, via insert_file_table(), uses a hashmap to store
files by oid.  Use a mem_pool for the hashmap entries so these can all be
allocated and deallocated together.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      204.2  ms ±  3.0  ms   202.5  ms ±  3.2  ms
    mega-renames:      1.076 s ±  0.015 s     1.072 s ±  0.012 s
    just-one-mega:   364.1  ms ±  7.0  ms   357.3  ms ±  3.9  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 4ef0459cfb5..73d884099eb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -317,10 +317,11 @@ static int find_identical_files(struct hashmap *srcs,
 }
 
 static void insert_file_table(struct repository *r,
+			      struct mem_pool *pool,
 			      struct hashmap *table, int index,
 			      struct diff_filespec *filespec)
 {
-	struct file_similarity *entry = xmalloc(sizeof(*entry));
+	struct file_similarity *entry = mem_pool_alloc(pool, sizeof(*entry));
 
 	entry->index = index;
 	entry->filespec = filespec;
@@ -336,7 +337,8 @@ static void insert_file_table(struct repository *r,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(struct diff_options *options)
+static int find_exact_renames(struct diff_options *options,
+			      struct mem_pool *pool)
 {
 	int i, renames = 0;
 	struct hashmap file_table;
@@ -346,7 +348,7 @@ static int find_exact_renames(struct diff_options *options)
 	 */
 	hashmap_init(&file_table, NULL, NULL, rename_src_nr);
 	for (i = rename_src_nr-1; i >= 0; i--)
-		insert_file_table(options->repo,
+		insert_file_table(options->repo, pool,
 				  &file_table, i,
 				  rename_src[i].p->one);
 
@@ -354,8 +356,8 @@ static int find_exact_renames(struct diff_options *options)
 	for (i = 0; i < rename_dst_nr; i++)
 		renames += find_identical_files(&file_table, i, options);
 
-	/* Free the hash data structure and entries */
-	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
+	/* Free the hash data structure (entries will be freed with the pool) */
+	hashmap_clear(&file_table);
 
 	return renames;
 }
@@ -1341,6 +1343,7 @@ void diffcore_rename_extended(struct diff_options *options,
 	int num_destinations, dst_cnt;
 	int num_sources, want_copies;
 	struct progress *progress = NULL;
+	struct mem_pool local_pool;
 	struct dir_rename_info info;
 	struct diff_populate_filespec_options dpf_options = {
 		.check_binary = 0,
@@ -1409,11 +1412,18 @@ void diffcore_rename_extended(struct diff_options *options,
 		goto cleanup; /* nothing to do */
 
 	trace2_region_enter("diff", "exact renames", options->repo);
+	mem_pool_init(&local_pool, 32*1024);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames(options);
+	rename_count = find_exact_renames(options, &local_pool);
+	/*
+	 * Discard local_pool immediately instead of at "cleanup:" in order
+	 * to reduce maximum memory usage; inexact rename detection uses up
+	 * a fair amount of memory, and mem_pools can too.
+	 */
+	mem_pool_discard(&local_pool, 0);
 	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Make the code more flexible so that it can handle both being run with or
without a memory pool by adding utility functions which will either call
    xmalloc, xcalloc, xstrndup
or
    mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
depending on whether we have a non-NULL memory pool.  A subsequent
commit will make use of these.

(We will actually be dropping these functions soon and just assuming we
always have a memory pool, but the flexibility was very useful during
development of merge-ort so I want to be able to restore it if needed.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 401a40247a3..63f67246d3d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -664,6 +664,30 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
+{
+	if (!pool)
+		return xcalloc(count, size);
+	return mem_pool_calloc(pool, count, size);
+}
+
+MAYBE_UNUSED
+static void *pool_alloc(struct mem_pool *pool, size_t size)
+{
+	if (!pool)
+		return xmalloc(size);
+	return mem_pool_alloc(pool, size);
+}
+
+MAYBE_UNUSED
+static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
+{
+	if (!pool)
+		return xstrndup(str, len);
+	return mem_pool_strndup(pool, str, len);
+}
+
 /* add a string to a strbuf, but converting "/" to "_" */
 static void add_flattened_path(struct strbuf *out, const char *s)
 {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 4/9] merge-ort: set up a memory pool
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-07-30 11:47     ` [PATCH v3 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort has a lot of data structures, and they all tend to be freed
together in clear_or_reinit_internal_opts().  Set up a memory pool to
allow us to make these allocations and deallocations faster.  Future
commits will adjust various callers to make use of this memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 63f67246d3d..3f425436263 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -37,6 +37,8 @@
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
 
+#define USE_MEMORY_POOL 1 /* faster, but obscures memory leak hunting */
+
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
  * indices refer to one of the sides of the three-way merge.  This is so
@@ -339,6 +341,17 @@ struct merge_options_internal {
 	 */
 	struct strmap conflicted;
 
+	/*
+	 * pool: memory pool for fast allocation/deallocation
+	 *
+	 * We allocate room for lots of filenames and auxiliary data
+	 * structures in merge_options_internal, and it tends to all be
+	 * freed together too.  Using a memory pool for these provides a
+	 * nice speedup.
+	 */
+	struct mem_pool internal_pool;
+	struct mem_pool *pool; /* NULL, or pointer to internal_pool */
+
 	/*
 	 * paths_to_free: additional list of strings to free
 	 *
@@ -603,6 +616,12 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
+#if USE_MEMORY_POOL
+	mem_pool_discard(&opti->internal_pool, 0);
+	if (!reinitialize)
+		opti->pool = NULL;
+#endif
+
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
 	renames->callback_data_nr = renames->callback_data_alloc = 0;
@@ -4381,6 +4400,12 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of various renames fields */
 	renames = &opt->priv->renames;
+#if USE_MEMORY_POOL
+	mem_pool_init(&opt->priv->internal_pool, 0);
+	opt->priv->pool = &opt->priv->internal_pool;
+#else
+	opt->priv->pool = NULL;
+#endif
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
 					    NOT_RELEVANT, NULL, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 5/9] merge-ort: switch our strmaps over to using memory pools
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-07-30 11:47     ` [PATCH v3 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For all the strmaps (including strintmaps and strsets) whose memory is
unconditionally freed as part of clear_or_reinit_internal_opts(), switch
them over to using our new memory pool.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      202.5  ms ±  3.2  ms    198.1 ms ±  2.6 ms
    mega-renames:      1.072 s ±  0.012 s    715.8 ms ±  4.0 ms
    just-one-mega:   357.3  ms ±  3.9  ms    276.8 ms ±  4.2 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 125 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 75 insertions(+), 50 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 3f425436263..99c75690855 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	void (*strset_clear_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
-	/*
-	 * We marked opti->paths with strdup_strings = 0, so that we
-	 * wouldn't have to make another copy of the fullpath created by
-	 * make_traverse_path from setup_path_info().  But, now that we've
-	 * used it and have no other references to these strings, it is time
-	 * to deallocate them.
-	 */
-	free_strmap_strings(&opti->paths);
-	strmap_clear_func(&opti->paths, 1);
+	if (opti->pool)
+		strmap_clear_func(&opti->paths, 0);
+	else {
+		/*
+		 * We marked opti->paths with strdup_strings = 0, so that
+		 * we wouldn't have to make another copy of the fullpath
+		 * created by make_traverse_path from setup_path_info().
+		 * But, now that we've used it and have no other references
+		 * to these strings, it is time to deallocate them.
+		 */
+		free_strmap_strings(&opti->paths);
+		strmap_clear_func(&opti->paths, 1);
+	}
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
@@ -556,16 +560,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 */
 	strmap_clear_func(&opti->conflicted, 0);
 
-	/*
-	 * opti->paths_to_free is similar to opti->paths; we created it with
-	 * strdup_strings = 0 to avoid making _another_ copy of the fullpath
-	 * but now that we've used it and have no other references to these
-	 * strings, it is time to deallocate them.  We do so by temporarily
-	 * setting strdup_strings to 1.
-	 */
-	opti->paths_to_free.strdup_strings = 1;
-	string_list_clear(&opti->paths_to_free, 0);
-	opti->paths_to_free.strdup_strings = 0;
+	if (!opti->pool) {
+		/*
+		 * opti->paths_to_free is similar to opti->paths; we
+		 * created it with strdup_strings = 0 to avoid making
+		 * _another_ copy of the fullpath but now that we've used
+		 * it and have no other references to these strings, it is
+		 * time to deallocate them.  We do so by temporarily
+		 * setting strdup_strings to 1.
+		 */
+		opti->paths_to_free.strdup_strings = 1;
+		string_list_clear(&opti->paths_to_free, 0);
+		opti->paths_to_free.strdup_strings = 0;
+	}
 
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
@@ -683,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
@@ -691,7 +697,6 @@ static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 	return mem_pool_calloc(pool, count, size);
 }
 
-MAYBE_UNUSED
 static void *pool_alloc(struct mem_pool *pool, size_t size)
 {
 	if (!pool)
@@ -699,7 +704,6 @@ static void *pool_alloc(struct mem_pool *pool, size_t size)
 	return mem_pool_alloc(pool, size);
 }
 
-MAYBE_UNUSED
 static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
 {
 	if (!pool)
@@ -835,8 +839,9 @@ static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
-				   sizeof(struct conflict_info));
+	mi = pool_calloc(opt->priv->pool, 1,
+			 resolved ? sizeof(struct merged_info) :
+				    sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
 	mi->basename_offset = current_dir_name_len;
 	mi->clean = !!resolved;
@@ -1128,7 +1133,7 @@ static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = xmalloc(len + 1);
+	fullpath = pool_alloc(opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -1383,7 +1388,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 		copy = renames->deferred[side].possible_trivial_merges;
 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
 					    0,
-					    NULL,
+					    opt->priv->pool,
 					    0);
 		strintmap_for_each_entry(&copy, &iter, entry) {
 			const char *path = entry->key;
@@ -2335,12 +2340,21 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 	VERIFY_CI(ci);
 
 	/* Find parent directories missing from opt->priv->paths */
-	cur_path = new_path;
+	if (opt->priv->pool) {
+		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
+		free((char*)new_path);
+		new_path = (char *)cur_path;
+	} else {
+		cur_path = new_path;
+	}
+
 	while (1) {
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = xstrndup(cur_path, last_slash - cur_path);
+			parent_name = pool_strndup(opt->priv->pool,
+						   cur_path,
+						   last_slash - cur_path);
 		} else {
 			parent_name = opt->priv->toplevel_dir;
 			break;
@@ -2349,7 +2363,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Look it up in opt->priv->paths */
 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
 		if (entry) {
-			free((char*)parent_name);
+			if (!opt->priv->pool)
+				free((char*)parent_name);
 			parent_name = entry->key; /* reuse known pointer */
 			break;
 		}
@@ -2376,12 +2391,15 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		parent_name = cur_dir;
 	}
 
-	/*
-	 * We are removing old_path from opt->priv->paths.  old_path also will
-	 * eventually need to be freed, but it may still be used by e.g.
-	 * ci->pathnames.  So, store it in another string-list for now.
-	 */
-	string_list_append(&opt->priv->paths_to_free, old_path);
+	if (!opt->priv->pool) {
+		/*
+		 * We are removing old_path from opt->priv->paths.
+		 * old_path also will eventually need to be freed, but it
+		 * may still be used by e.g.  ci->pathnames.  So, store it
+		 * in another string-list for now.
+		 */
+		string_list_append(&opt->priv->paths_to_free, old_path);
+	}
 
 	assert(ci->filemask == 2 || ci->filemask == 4);
 	assert(ci->dirmask == 0);
@@ -2416,7 +2434,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		new_ci->stages[index].mode = ci->stages[index].mode;
 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
 
-		free(ci);
+		if (!opt->priv->pool)
+			free(ci);
 		ci = new_ci;
 	}
 
@@ -3623,7 +3642,8 @@ static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		CALLOC_ARRAY(new_ci, 1);
+		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
 		 * want any directory mode/oid values copied, but we'll zero
@@ -3715,7 +3735,7 @@ static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = xmalloc(sizeof(*new_ci));
+			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
 				rename_a = 1;
@@ -3788,12 +3808,14 @@ static void process_entry(struct merge_options *opt,
 				strmap_remove(&opt->priv->paths, path, 0);
 				/*
 				 * We removed path from opt->priv->paths.  path
-				 * will also eventually need to be freed, but
-				 * it may still be used by e.g.  ci->pathnames.
-				 * So, store it in another string-list for now.
+				 * will also eventually need to be freed if not
+				 * part of a memory pool...but it may still be
+				 * used by e.g. ci->pathnames.  So, store it in
+				 * another string-list for now in that case.
 				 */
-				string_list_append(&opt->priv->paths_to_free,
-						   path);
+				if (!opt->priv->pool)
+					string_list_append(&opt->priv->paths_to_free,
+							   path);
 			}
 
 			/*
@@ -4335,6 +4357,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
 	struct rename_info *renames;
 	int i;
+	struct mem_pool *pool = NULL;
 
 	/* Sanity checks on opt */
 	trace2_region_enter("merge", "sanity checks", opt->repo);
@@ -4406,9 +4429,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 #else
 	opt->priv->pool = NULL;
 #endif
+	pool = opt->priv->pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
-					    NOT_RELEVANT, NULL, 0);
+					    NOT_RELEVANT, pool, 0);
 		strmap_init_with_options(&renames->dir_rename_count[i],
 					 NULL, 1);
 		strmap_init_with_options(&renames->dir_renames[i],
@@ -4422,7 +4446,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 		 */
 		strintmap_init_with_options(&renames->relevant_sources[i],
 					    -1 /* explicitly invalid */,
-					    NULL, 0);
+					    pool, 0);
 		strmap_init_with_options(&renames->cached_pairs[i],
 					 NULL, 1);
 		strset_init_with_options(&renames->cached_irrelevant[i],
@@ -4432,9 +4456,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->deferred[i].possible_trivial_merges,
-					    0, NULL, 0);
+					    0, pool, 0);
 		strset_init_with_options(&renames->deferred[i].target_dirs,
-					 NULL, 1);
+					 pool, 1);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 
@@ -4447,9 +4471,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * In contrast, conflicted just has a subset of keys from paths, so
 	 * we don't want to free those (it'd be a duplicate free).
 	 */
-	strmap_init_with_options(&opt->priv->paths, NULL, 0);
-	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
-	string_list_init_nodup(&opt->priv->paths_to_free);
+	strmap_init_with_options(&opt->priv->paths, pool, 0);
+	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
+	if (!opt->priv->pool)
+		string_list_init_nodup(&opt->priv->paths_to_free);
 
 	/*
 	 * keys & strbufs in output will sometimes need to outlive "paths",
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-07-30 11:47     ` [PATCH v3 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

We want to be able to allocate filespecs and filepairs using a mem_pool.
However, filespec data will still remain outside the pool (perhaps in
the future we could plumb the pool through the various diff APIs to
allocate the filespec data too, but for now we are limiting the scope).
Add some extra functions to allocate these appropriately based on the
non-NULL-ness of opt->priv->pool, as well as some extra functions to
handle correctly deallocating the relevant parts of them.  A future
commit will make use of these new functions.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 41 +++++++++++++++++++++++++++++++++++++++++
 diffcore.h        |  2 ++
 merge-ort.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 73d884099eb..5bc559f79e9 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1328,6 +1328,47 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
 	rename_src_nr = new_num_src;
 }
 
+static void free_filespec_data(struct diff_filespec *spec)
+{
+	if (!--spec->count)
+		diff_free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+static void pool_free_filespec(struct mem_pool *pool,
+			       struct diff_filespec *spec)
+{
+	if (!pool) {
+		free_filespec(spec);
+		return;
+	}
+
+	/*
+	 * Similar to free_filespec(), but only frees the data.  The spec
+	 * itself was allocated in the pool and should not be individually
+	 * freed.
+	 */
+	free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p)
+{
+	if (!pool) {
+		diff_free_filepair(p);
+		return;
+	}
+
+	/*
+	 * Similar to diff_free_filepair() but only frees the data from the
+	 * filespecs; not the filespecs or the filepair which were
+	 * allocated from the pool.
+	 */
+	free_filespec_data(p->one);
+	free_filespec_data(p->two);
+}
+
 void diffcore_rename_extended(struct diff_options *options,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
diff --git a/diffcore.h b/diffcore.h
index 533b30e21e7..b58ee6b1934 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -127,6 +127,8 @@ struct diff_filepair {
 #define DIFF_PAIR_MODE_CHANGED(p) ((p)->one->mode != (p)->two->mode)
 
 void diff_free_filepair(struct diff_filepair *);
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p);
 
 int diff_unmodified_pair(struct diff_filepair *);
 
diff --git a/merge-ort.c b/merge-ort.c
index 99c75690855..e79830f9181 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,6 +690,48 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
+						 const char *path)
+{
+	struct diff_filespec *spec;
+	size_t len;
+
+	if (!pool)
+		return alloc_filespec(path);
+
+	/* Same code as alloc_filespec, except allocate from pool */
+	len = strlen(path);
+
+	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
+	memcpy(spec+1, path, len);
+	spec->path = (void*)(spec+1);
+
+	spec->count = 1;
+	spec->is_binary = -1;
+	return spec;
+}
+
+MAYBE_UNUSED
+static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
+					     struct diff_queue_struct *queue,
+					     struct diff_filespec *one,
+					     struct diff_filespec *two)
+{
+	struct diff_filepair *dp;
+
+	if (!pool)
+		return diff_queue(queue, one, two);
+
+	/* Same code as diff_queue, except allocate from pool */
+	dp = mem_pool_calloc(pool, 1, sizeof(*dp));
+	dp->one = one;
+	dp->two = two;
+	if (queue)
+		diff_q(queue, dp);
+	return dp;
+}
+
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 7/9] merge-ort: store filepairs and filespecs in our mem_pool
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-07-30 11:47     ` [PATCH v3 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.1 ms ±  2.6 ms     198.5 ms ±  3.4 ms
    mega-renames:     715.8 ms ±  4.0 ms     679.1 ms ±  5.6 ms
    just-one-mega:    276.8 ms ±  4.2 ms     271.9 ms ±  2.8 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  9 ++++-----
 diffcore.h        |  1 +
 merge-ort.c       | 26 ++++++++++++++------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 5bc559f79e9..7e6b3e1b143 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1334,7 +1334,6 @@ static void free_filespec_data(struct diff_filespec *spec)
 		diff_free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 static void pool_free_filespec(struct mem_pool *pool,
 			       struct diff_filespec *spec)
 {
@@ -1351,7 +1350,6 @@ static void pool_free_filespec(struct mem_pool *pool,
 	free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 void pool_diff_free_filepair(struct mem_pool *pool,
 			     struct diff_filepair *p)
 {
@@ -1370,6 +1368,7 @@ void pool_diff_free_filepair(struct mem_pool *pool,
 }
 
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
@@ -1683,7 +1682,7 @@ void diffcore_rename_extended(struct diff_options *options,
 			pair_to_free = p;
 
 		if (pair_to_free)
-			diff_free_filepair(pair_to_free);
+			pool_diff_free_filepair(pool, pair_to_free);
 	}
 	diff_debug_queue("done copying original", &outq);
 
@@ -1693,7 +1692,7 @@ void diffcore_rename_extended(struct diff_options *options,
 
 	for (i = 0; i < rename_dst_nr; i++)
 		if (rename_dst[i].filespec_to_free)
-			free_filespec(rename_dst[i].filespec_to_free);
+			pool_free_filespec(pool, rename_dst[i].filespec_to_free);
 
 	cleanup_dir_rename_info(&info, dirs_removed, dir_rename_count != NULL);
 	FREE_AND_NULL(rename_dst);
@@ -1710,5 +1709,5 @@ void diffcore_rename_extended(struct diff_options *options,
 
 void diffcore_rename(struct diff_options *options)
 {
-	diffcore_rename_extended(options, NULL, NULL, NULL, NULL);
+	diffcore_rename_extended(options, NULL, NULL, NULL, NULL, NULL);
 }
diff --git a/diffcore.h b/diffcore.h
index b58ee6b1934..badc2261c20 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -181,6 +181,7 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count);
 void diffcore_break(struct repository *, int);
 void diffcore_rename(struct diff_options *);
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
diff --git a/merge-ort.c b/merge-ort.c
index e79830f9181..f4f0a3d57f0 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
@@ -712,7 +711,6 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 	return spec;
 }
 
-MAYBE_UNUSED
 static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 					     struct diff_queue_struct *queue,
 					     struct diff_filespec *one,
@@ -930,6 +928,7 @@ static void add_pair(struct merge_options *opt,
 		     unsigned dir_rename_mask)
 {
 	struct diff_filespec *one, *two;
+	struct mem_pool *pool = opt->priv->pool;
 	struct rename_info *renames = &opt->priv->renames;
 	int names_idx = is_add ? side : 0;
 
@@ -980,11 +979,11 @@ static void add_pair(struct merge_options *opt,
 			return;
 	}
 
-	one = alloc_filespec(pathname);
-	two = alloc_filespec(pathname);
+	one = pool_alloc_filespec(pool, pathname);
+	two = pool_alloc_filespec(pool, pathname);
 	fill_filespec(is_add ? two : one,
 		      &names[names_idx].oid, 1, names[names_idx].mode);
-	diff_queue(&renames->pairs[side], one, two);
+	pool_diff_queue(pool, &renames->pairs[side], one, two);
 }
 
 static void collect_rename_info(struct merge_options *opt,
@@ -2893,6 +2892,7 @@ static void use_cached_pairs(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
+	struct mem_pool *pool = opt->priv->pool;
 
 	/*
 	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
@@ -2906,9 +2906,9 @@ static void use_cached_pairs(struct merge_options *opt,
 			new_name = old_name;
 
 		/* We don't care about oid/mode, only filenames and status */
-		one = alloc_filespec(old_name);
-		two = alloc_filespec(new_name);
-		diff_queue(pairs, one, two);
+		one = pool_alloc_filespec(pool, old_name);
+		two = pool_alloc_filespec(pool, new_name);
+		pool_diff_queue(pool, pairs, one, two);
 		pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
 	}
 }
@@ -3016,6 +3016,7 @@ static int detect_regular_renames(struct merge_options *opt,
 	diff_queued_diff = renames->pairs[side_index];
 	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diffcore_rename_extended(&diff_opts,
+				 opt->priv->pool,
 				 &renames->relevant_sources[side_index],
 				 &renames->dirs_removed[side_index],
 				 &renames->dir_rename_count[side_index],
@@ -3066,7 +3067,7 @@ static int collect_renames(struct merge_options *opt,
 
 		if (p->status != 'A' && p->status != 'R') {
 			possibly_cache_new_pair(renames, p, side_index, NULL);
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3079,7 +3080,7 @@ static int collect_renames(struct merge_options *opt,
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
 		if (p->status != 'R' && !new_path) {
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3197,7 +3198,7 @@ cleanup:
 		side_pairs = &renames->pairs[s];
 		for (i = 0; i < side_pairs->nr; ++i) {
 			struct diff_filepair *p = side_pairs->queue[i];
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 		}
 	}
 
@@ -3210,7 +3211,8 @@ simple_cleanup:
 	if (combined.nr) {
 		int i;
 		for (i = 0; i < combined.nr; i++)
-			diff_free_filepair(combined.queue[i]);
+			pool_diff_free_filepair(opt->priv->pool,
+						combined.queue[i]);
 		free(combined.queue);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 8/9] merge-ort: reuse path strings in pool_alloc_filespec
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-07-30 11:47     ` [PATCH v3 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 11:47     ` [PATCH v3 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
  9 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

pool_alloc_filespec() was written so that the code when pool != NULL
mimicked the code from alloc_filespec(), which including allocating
enough extra space for the path and then copying it.  However, the path
passed to pool_alloc_filespec() is always going to already be in the
same memory pool, so we may as well reuse it instead of copying it.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.5 ms ±  3.4 ms     198.3 ms ±  2.9 ms
    mega-renames:     679.1 ms ±  5.6 ms     661.8 ms ±  5.9 ms
    just-one-mega:    271.9 ms ±  2.8 ms     264.6 ms ±  2.5 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index f4f0a3d57f0..86ab8f60121 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -694,17 +694,13 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
 	struct diff_filespec *spec;
-	size_t len;
 
 	if (!pool)
 		return alloc_filespec(path);
 
-	/* Same code as alloc_filespec, except allocate from pool */
-	len = strlen(path);
-
-	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
-	memcpy(spec+1, path, len);
-	spec->path = (void*)(spec+1);
+	/* Similar to alloc_filespec, but allocate from pool and reuse path */
+	spec = mem_pool_calloc(pool, 1, sizeof(*spec));
+	spec->path = (char*)path; /* spec won't modify it */
 
 	spec->count = 1;
 	spec->is_binary = -1;
@@ -2904,6 +2900,25 @@ static void use_cached_pairs(struct merge_options *opt,
 		const char *new_name = entry->value;
 		if (!new_name)
 			new_name = old_name;
+		if (pool) {
+			/*
+			 * cached_pairs has _copies* of old_name and new_name,
+			 * because it has to persist across merges.  When
+			 *   pool != NULL
+			 * pool_alloc_filespec() will just re-use the existing
+			 * filenames, which will also get re-used by
+			 * opt->priv->paths if they become renames, and then
+			 * get freed at the end of the merge, leaving the copy
+			 * in cached_pairs dangling.  Avoid this by making a
+			 * copy here.
+			 *
+			 * When pool == NULL, pool_alloc_filespec() calls
+			 * alloc_filespec(), which makes a copy; we don't want
+			 * to add another.
+			 */
+			old_name = mem_pool_strdup(pool, old_name);
+			new_name = mem_pool_strdup(pool, new_name);
+		}
 
 		/* We don't care about oid/mode, only filenames and status */
 		one = pool_alloc_filespec(pool, old_name);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-07-30 11:47     ` [PATCH v3 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
@ 2021-07-30 11:47     ` Elijah Newren via GitGitGadget
  2021-07-30 16:24       ` Jeff King
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
  9 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-30 11:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Simplify code maintenance a bit by removing the ability to toggle
between usage of memory pools and direct allocations.  This allows us to
also remove and simplify some auxiliary functions.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 63 +++++++++++++----------------------------------------
 1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 86ab8f60121..63829f5cace 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -37,8 +37,6 @@
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
 
-#define USE_MEMORY_POOL 1 /* faster, but obscures memory leak hunting */
-
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
  * indices refer to one of the sides of the three-way merge.  This is so
@@ -623,11 +621,9 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
-#if USE_MEMORY_POOL
 	mem_pool_discard(&opti->internal_pool, 0);
 	if (!reinitialize)
 		opti->pool = NULL;
-#endif
 
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
@@ -693,12 +689,10 @@ static void path_msg(struct merge_options *opt,
 static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
+	/* Similar to alloc_filespec(), but allocate from pool and reuse path */
 	struct diff_filespec *spec;
 
-	if (!pool)
-		return alloc_filespec(path);
-
-	/* Similar to alloc_filespec, but allocate from pool and reuse path */
+	assert(pool != NULL);
 	spec = mem_pool_calloc(pool, 1, sizeof(*spec));
 	spec->path = (char*)path; /* spec won't modify it */
 
@@ -712,12 +706,10 @@ static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 					     struct diff_filespec *one,
 					     struct diff_filespec *two)
 {
+	/* Same code as diff_queue(), except allocate from pool */
 	struct diff_filepair *dp;
 
-	if (!pool)
-		return diff_queue(queue, one, two);
-
-	/* Same code as diff_queue, except allocate from pool */
+	assert(pool != NULL);
 	dp = mem_pool_calloc(pool, 1, sizeof(*dp));
 	dp->one = one;
 	dp->two = two;
@@ -726,27 +718,6 @@ static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 	return dp;
 }
 
-static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
-{
-	if (!pool)
-		return xcalloc(count, size);
-	return mem_pool_calloc(pool, count, size);
-}
-
-static void *pool_alloc(struct mem_pool *pool, size_t size)
-{
-	if (!pool)
-		return xmalloc(size);
-	return mem_pool_alloc(pool, size);
-}
-
-static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
-{
-	if (!pool)
-		return xstrndup(str, len);
-	return mem_pool_strndup(pool, str, len);
-}
-
 /* add a string to a strbuf, but converting "/" to "_" */
 static void add_flattened_path(struct strbuf *out, const char *s)
 {
@@ -875,9 +846,9 @@ static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = pool_calloc(opt->priv->pool, 1,
-			 resolved ? sizeof(struct merged_info) :
-				    sizeof(struct conflict_info));
+	mi = mem_pool_calloc(opt->priv->pool, 1,
+			     resolved ? sizeof(struct merged_info) :
+					sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
 	mi->basename_offset = current_dir_name_len;
 	mi->clean = !!resolved;
@@ -1170,7 +1141,7 @@ static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = pool_alloc(opt->priv->pool, len + 1);
+	fullpath = mem_pool_alloc(opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -2389,9 +2360,9 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = pool_strndup(opt->priv->pool,
-						   cur_path,
-						   last_slash - cur_path);
+			parent_name = mem_pool_strndup(opt->priv->pool,
+						       cur_path,
+						       last_slash - cur_path);
 		} else {
 			parent_name = opt->priv->toplevel_dir;
 			break;
@@ -3701,7 +3672,7 @@ static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+		new_ci = mem_pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
 
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
@@ -3794,7 +3765,8 @@ static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
+			new_ci = mem_pool_alloc(opt->priv->pool,
+						sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
 				rename_a = 1;
@@ -4482,13 +4454,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of various renames fields */
 	renames = &opt->priv->renames;
-#if USE_MEMORY_POOL
 	mem_pool_init(&opt->priv->internal_pool, 0);
-	opt->priv->pool = &opt->priv->internal_pool;
-#else
-	opt->priv->pool = NULL;
-#endif
-	pool = opt->priv->pool;
+	pool = opt->priv->pool = &opt->priv->internal_pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
 					    NOT_RELEVANT, pool, 0);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-29 20:09         ` Jeff King
  2021-07-30  2:30           ` Elijah Newren
@ 2021-07-30 13:30           ` Ævar Arnfjörð Bjarmason
  2021-07-30 14:36             ` Elijah Newren
  1 sibling, 1 reply; 65+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-30 13:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
	Eric Sunshine, Derrick Stolee


On Thu, Jul 29 2021, Jeff King wrote:

> On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
>
>> > Arguably, the existence of these function indirections is perhaps a sign
>> > that the strmap API should provide a version of the clear functions that
>> > takes "partial / not-partial" as a parameter.
>> 
>> Are you suggesting a modification of str{map,intmap,set}_clear() to
>> take an extra parameter, or removing the
>> str{map,intmap,set}_partial_clear() functions and introducing new
>> functions that take a partial/not-partial parameter?  I think you're
>> suggesting the latter, and that makes more sense to me...but I'm
>> drawing blanks trying to come up with a reasonable function name.
>
> It does seem a shame to add the "partial" parameter to strmap_clear(),
> just because most callers don't need it (so they end up with this
> inscrutable "0" parameter).
>
> What if there was a flags field? Then it could be combined with the
> free_values parameter. The result is kind of verbose in two ways:
>
>  - now strset_clear(), etc, need a "flags" parameter, which they didn't
>    before (and is just "0" most of the time!)
>
>  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
>    That's a lot longer, though arguably it's easier to understand since
>    the boolean is explained.
>
> Having gone through the exercise, I am not sure it is actually making
> anything more readable (messy patch is below for reference).

I've got some WIP patches for string-list.h and strmap.h to make the API
nicer, and it's probably applicable to strset.h too.

I.e. I found when using strset.h that it was a weird API to use, because
unlike string-list.h it didn't pay attention to your "dup" field when
freeing, you had to do it explicitly.

And then in e.g. merge-ort.c there's this "strdup dance" pattern where
we flip the field back and forth.

The below diff is exctracted from that WIP work, with the relevant two
API headers and then two changed API users for show (the tree-wide
changes are much larger).

I think making the promise I make in the updated docs at "We guarantee
that the `clearfunc`[...]" in string-list.h makes for particularly nice
API behavior.

 builtin/remote.c | 37 ++++++++++++++++++++---------------
 merge-ort.c      | 32 +++++++-----------------------
 string-list.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 strmap.h         | 13 +++++++++++++
 4 files changed, 98 insertions(+), 43 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 7f88e6ce9de..ec1dbd49f71 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -340,10 +340,24 @@ static void read_branches(void)
 
 struct ref_states {
 	struct remote *remote;
-	struct string_list new_refs, stale, tracked, heads, push;
+
+	struct string_list new_refs;
+	struct string_list stale;
+	struct string_list tracked;
+	struct string_list heads;
+	struct string_list push;
+
 	int queried;
 };
 
+#define REF_STATES_INIT { \
+	.new_refs = STRING_LIST_INIT_DUP, \
+	.stale = STRING_LIST_INIT_DUP, \
+	.tracked = STRING_LIST_INIT_DUP, \
+	.heads = STRING_LIST_INIT_DUP, \
+	.push = STRING_LIST_INIT_DUP, \
+}
+
 static int get_ref_states(const struct ref *remote_refs, struct ref_states *states)
 {
 	struct ref *fetch_map = NULL, **tail = &fetch_map;
@@ -355,9 +369,6 @@ static int get_ref_states(const struct ref *remote_refs, struct ref_states *stat
 			die(_("Could not get fetch map for refspec %s"),
 				states->remote->fetch.raw[i]);
 
-	states->new_refs.strdup_strings = 1;
-	states->tracked.strdup_strings = 1;
-	states->stale.strdup_strings = 1;
 	for (ref = fetch_map; ref; ref = ref->next) {
 		if (!ref->peer_ref || !ref_exists(ref->peer_ref->name))
 			string_list_append(&states->new_refs, abbrev_branch(ref->name));
@@ -406,7 +417,6 @@ static int get_push_ref_states(const struct ref *remote_refs,
 
 	match_push_refs(local_refs, &push_map, &remote->push, MATCH_REFS_NONE);
 
-	states->push.strdup_strings = 1;
 	for (ref = push_map; ref; ref = ref->next) {
 		struct string_list_item *item;
 		struct push_info *info;
@@ -449,7 +459,6 @@ static int get_push_ref_states_noquery(struct ref_states *states)
 	if (remote->mirror)
 		return 0;
 
-	states->push.strdup_strings = 1;
 	if (!remote->push.nr) {
 		item = string_list_append(&states->push, _("(matching)"));
 		info = item->util = xcalloc(1, sizeof(struct push_info));
@@ -483,7 +492,6 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
 	refspec.force = 0;
 	refspec.pattern = 1;
 	refspec.src = refspec.dst = "refs/heads/*";
-	states->heads.strdup_strings = 1;
 	get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
 	matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
 				    fetch_map, 1);
@@ -905,7 +913,7 @@ static void clear_push_info(void *util, const char *string)
 {
 	struct push_info *info = util;
 	free(info->dest);
-	free(info);
+	/* note: fixed memleak here */
 }
 
 static void free_remote_ref_states(struct ref_states *states)
@@ -1159,7 +1167,7 @@ static int get_one_entry(struct remote *remote, void *priv)
 		string_list_append(list, remote->name)->util =
 				strbuf_detach(&url_buf, NULL);
 	} else
-		string_list_append(list, remote->name)->util = NULL;
+		string_list_append(list, remote->name);
 	if (remote->pushurl_nr) {
 		url = remote->pushurl;
 		url_nr = remote->pushurl_nr;
@@ -1179,10 +1187,9 @@ static int get_one_entry(struct remote *remote, void *priv)
 
 static int show_all(void)
 {
-	struct string_list list = STRING_LIST_INIT_NODUP;
+	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
-	list.strdup_strings = 1;
 	result = for_each_remote(get_one_entry, &list);
 
 	if (!result) {
@@ -1212,7 +1219,7 @@ static int show(int argc, const char **argv)
 		OPT_BOOL('n', NULL, &no_query, N_("do not query remotes")),
 		OPT_END()
 	};
-	struct ref_states states;
+	struct ref_states states = REF_STATES_INIT;
 	struct string_list info_list = STRING_LIST_INIT_NODUP;
 	struct show_info info;
 
@@ -1334,8 +1341,7 @@ static int set_head(int argc, const char **argv)
 	if (!opt_a && !opt_d && argc == 2) {
 		head_name = xstrdup(argv[1]);
 	} else if (opt_a && !opt_d && argc == 1) {
-		struct ref_states states;
-		memset(&states, 0, sizeof(states));
+		struct ref_states states = REF_STATES_INIT;
 		get_remote_ref_states(argv[0], &states, GET_HEAD_NAMES);
 		if (!states.heads.nr)
 			result |= error(_("Cannot determine remote HEAD"));
@@ -1374,14 +1380,13 @@ static int set_head(int argc, const char **argv)
 static int prune_remote(const char *remote, int dry_run)
 {
 	int result = 0;
-	struct ref_states states;
+	struct ref_states states = REF_STATES_INIT;
 	struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
 	struct string_list_item *item;
 	const char *dangling_msg = dry_run
 		? _(" %s will become dangling!")
 		: _(" %s has become dangling!");
 
-	memset(&states, 0, sizeof(states));
 	get_remote_ref_states(remote, &states, GET_REF_STATES);
 
 	if (!states.stale.nr) {
diff --git a/merge-ort.c b/merge-ort.c
index ec0c5904211..53ed78e7a01 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -432,16 +432,6 @@ struct conflict_info {
 	assert((ci) && !(mi)->clean);        \
 } while (0)
 
-static void free_strmap_strings(struct strmap *map)
-{
-	struct hashmap_iter iter;
-	struct strmap_entry *entry;
-
-	strmap_for_each_entry(map, &iter, entry) {
-		free((char*)entry->key);
-	}
-}
-
 static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 					  int reinitialize)
 {
@@ -455,13 +445,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		reinitialize ? strset_partial_clear : strset_clear;
 
 	/*
-	 * We marked opti->paths with strdup_strings = 0, so that we
-	 * wouldn't have to make another copy of the fullpath created by
-	 * make_traverse_path from setup_path_info().  But, now that we've
-	 * used it and have no other references to these strings, it is time
-	 * to deallocate them.
+	 * We used the the pattern of re-using already allocated
+	 * strings strmap_clear_strings() in make_traverse_path from
+	 * setup_path_info(). Deallocate them.
 	 */
-	free_strmap_strings(&opti->paths);
+	strmap_clear_strings(&opti->paths, 0);
 	strmap_func(&opti->paths, 1);
 
 	/*
@@ -472,15 +460,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	strmap_func(&opti->conflicted, 0);
 
 	/*
-	 * opti->paths_to_free is similar to opti->paths; we created it with
-	 * strdup_strings = 0 to avoid making _another_ copy of the fullpath
-	 * but now that we've used it and have no other references to these
-	 * strings, it is time to deallocate them.  We do so by temporarily
-	 * setting strdup_strings to 1.
+	 * opti->paths_to_free is similar to opti->paths; it's memory
+	 * we borrowed and need to free with string_list_clear_strings().
 	 */
-	opti->paths_to_free.strdup_strings = 1;
-	string_list_clear(&opti->paths_to_free, 0);
-	opti->paths_to_free.strdup_strings = 0;
+	string_list_clear_strings(&opti->paths_to_free, 0);
 
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
@@ -2664,7 +2647,6 @@ static int collect_renames(struct merge_options *opt,
 	 * and have no other references to these strings, it is time to
 	 * deallocate them.
 	 */
-	free_strmap_strings(&collisions);
 	strmap_clear(&collisions, 1);
 	return clean;
 }
diff --git a/string-list.h b/string-list.h
index 0d6b4692396..9eeea996888 100644
--- a/string-list.h
+++ b/string-list.h
@@ -109,6 +109,9 @@ void string_list_init_dup(struct string_list *list);
  */
 void string_list_init(struct string_list *list, int strdup_strings);
 
+void string_list_cmp_init(struct string_list *list, int strdup_strings,
+			  compare_strings_fn cmp);
+
 /** Callback function type for for_each_string_list */
 typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
 
@@ -129,14 +132,66 @@ void filter_string_list(struct string_list *list, int free_util,
  */
 void string_list_clear(struct string_list *list, int free_util);
 
+/**
+ * Free a string list initialized without `strdup_strings = 1`, but
+ * where we also want to free() the strings. You usually want to just
+ * use string_list_clear() after initializing with
+ * `STRING_LIST_INIT_DUP' instead.
+ *
+ * Useful to free e.g. a string list whose strings came from
+ * strbuf_detach() or other memory that we didn't initially allocate
+ * on the heap, but which we now manage.
+ *
+ * Under the hood this is identical in behavior to temporarily setting
+ * `strbuf_strings` to `1` for the duration of this function call, but
+ * without the verbosity of performing that dance yourself.
+ */
+void string_list_clear_strings(struct string_list *list, int free_util);
+
+/**
+ * Clear only the `util` pointer, but not the `string`, even if
+ * `strdup_strings = 1` is set. Useful for the idiom of doing e.g.:
+ *
+ *    string_list_append(&list, str + offs)->util = str;
+ *
+ * Where we add a string at some offset, own the string (so
+ * effectively `strdup_strings = `), but can't free() the string
+ * itself at the changed offset, but need to free the original data in
+ * `util` instead.
+ */
+void string_list_clear_util(struct string_list *list);
+
 /**
  * Callback type for `string_list_clear_func`.  The string associated
  * with the util pointer is passed as the second argument
  */
 typedef void (*string_list_clear_func_t)(void *p, const char *str);
 
-/** Call a custom clear function on each util pointer */
-void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc);
+/**
+ * Like string_list_clear() except that it first calls a custom clear
+ * function on each util pointer.
+ *
+ * We guarantee that the `clearfunc` will be called on all util
+ * pointers in a list before we proceed to free the first string or
+ * util pointer, i.e. should you need to it's OK to peek at other util
+ * items in the list itself, or to otherwise iterate it from within
+ * the `clearfunc`.
+ *
+ * You do not need to free() the passed-in util pointer itself,
+ * i.e. after calling all `clearfunc` this has the seme behavior as
+ * string_list_clear() called with with `free_util = 1`.
+ */
+void string_list_clear_func(struct string_list *list,
+			    string_list_clear_func_t clearfunc);
+
+/**
+ * Like string_list_clear_func() but free the strings too, using the
+ * same dance as described for string_list_clear_strings()
+ * above. You'll usually want to initialize with
+ * `STRING_LIST_INIT_DUP` and use string_list_clear_strings() instead.
+ */
+void string_list_clear_strings_func(struct string_list *list,
+				    string_list_clear_func_t clearfunc);
 
 /**
  * Apply `func` to each item. If `func` returns nonzero, the
diff --git a/strmap.h b/strmap.h
index 1e152d832d6..337f6278e86 100644
--- a/strmap.h
+++ b/strmap.h
@@ -51,12 +51,25 @@ void strmap_init_with_options(struct strmap *map,
  */
 void strmap_clear(struct strmap *map, int free_values);
 
+/**
+ * To strmap_clear() what string_list_clear_strings() is to
+ * string_list_clear(). I.e. free your keys too, which we used as-is
+ * without `strdup_strings = 1`.
+ */
+void strmap_clear_strings(struct strmap *map, int free_values);
+
 /*
  * Similar to strmap_clear() but leaves map->map->table allocated and
  * pre-sized so that subsequent uses won't need as many rehashings.
  */
 void strmap_partial_clear(struct strmap *map, int free_values);
 
+/**
+ * To strmap_partial_clear() what string_list_clear_strings() is to
+ * string_list_clear(). See strmap_clear_strings() above.
+ */
+void strmap_partial_clear_strings(struct strmap *map, int free_values);
+
 /*
  * Insert "str" into the map, pointing to "data".
  *

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-30 13:30           ` Ævar Arnfjörð Bjarmason
@ 2021-07-30 14:36             ` Elijah Newren
  2021-07-30 16:23               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-07-30 14:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List,
	Eric Sunshine, Derrick Stolee

Hi Ævar,

On Fri, Jul 30, 2021 at 7:33 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Thu, Jul 29 2021, Jeff King wrote:
>
> > On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
> >
> >> > Arguably, the existence of these function indirections is perhaps a sign
> >> > that the strmap API should provide a version of the clear functions that
> >> > takes "partial / not-partial" as a parameter.
> >>
> >> Are you suggesting a modification of str{map,intmap,set}_clear() to
> >> take an extra parameter, or removing the
> >> str{map,intmap,set}_partial_clear() functions and introducing new
> >> functions that take a partial/not-partial parameter?  I think you're
> >> suggesting the latter, and that makes more sense to me...but I'm
> >> drawing blanks trying to come up with a reasonable function name.
> >
> > It does seem a shame to add the "partial" parameter to strmap_clear(),
> > just because most callers don't need it (so they end up with this
> > inscrutable "0" parameter).
> >
> > What if there was a flags field? Then it could be combined with the
> > free_values parameter. The result is kind of verbose in two ways:
> >
> >  - now strset_clear(), etc, need a "flags" parameter, which they didn't
> >    before (and is just "0" most of the time!)
> >
> >  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
> >    That's a lot longer, though arguably it's easier to understand since
> >    the boolean is explained.
> >
> > Having gone through the exercise, I am not sure it is actually making
> > anything more readable (messy patch is below for reference).
>
> I've got some WIP patches for string-list.h and strmap.h to make the API
> nicer, and it's probably applicable to strset.h too.

There is no strset.h; strset and strintmap along with strmap are part
of strmap.h.

> I.e. I found when using strset.h that it was a weird API to use, because
> unlike string-list.h it didn't pay attention to your "dup" field when
> freeing, you had to do it explicitly.

Do you mean strmap.h instead of strset.h?

In general, if you are asking strmap/strset/strintmap to dup your keys
and are explicitly freeing the strings, then you are misusing the API
and either freeing pointers that were never allocated or getting
double frees.  It's wrong to explicitly deallocate them because:
  * When using a pool, we just allocate from the pool.  The memory
will be freed when the pool is freed.
  * When not using a pool, we use FLEXPTR_ALLOC_STR in order to make
the string be part of the allocated strmap_entry.  The string's memory
is deallocated when the strmap_entry is.

The only reason to explicitly free keys in a strmap/strset/strintmap
is if you do NOT have strdup_strings set and allocated the strings
elsewhere and left your strmap as the only thing tracking the strings.

> And then in e.g. merge-ort.c there's this "strdup dance" pattern where
> we flip the field back and forth.
>
> The below diff is exctracted from that WIP work, with the relevant two
> API headers and then two changed API users for show (the tree-wide
> changes are much larger).
>
> I think making the promise I make in the updated docs at "We guarantee
> that the `clearfunc`[...]" in string-list.h makes for particularly nice
> API behavior.
>
>  builtin/remote.c | 37 ++++++++++++++++++++---------------
>  merge-ort.c      | 32 +++++++-----------------------
>  string-list.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  strmap.h         | 13 +++++++++++++
>  4 files changed, 98 insertions(+), 43 deletions(-)
>
> diff --git a/builtin/remote.c b/builtin/remote.c
> index 7f88e6ce9de..ec1dbd49f71 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -340,10 +340,24 @@ static void read_branches(void)
>
>  struct ref_states {
>         struct remote *remote;
> -       struct string_list new_refs, stale, tracked, heads, push;
> +
> +       struct string_list new_refs;
> +       struct string_list stale;
> +       struct string_list tracked;
> +       struct string_list heads;
> +       struct string_list push;
> +
>         int queried;
>  };
>
> +#define REF_STATES_INIT { \
> +       .new_refs = STRING_LIST_INIT_DUP, \
> +       .stale = STRING_LIST_INIT_DUP, \
> +       .tracked = STRING_LIST_INIT_DUP, \
> +       .heads = STRING_LIST_INIT_DUP, \
> +       .push = STRING_LIST_INIT_DUP, \
> +}
> +
>  static int get_ref_states(const struct ref *remote_refs, struct ref_states *states)
>  {
>         struct ref *fetch_map = NULL, **tail = &fetch_map;
> @@ -355,9 +369,6 @@ static int get_ref_states(const struct ref *remote_refs, struct ref_states *stat
>                         die(_("Could not get fetch map for refspec %s"),
>                                 states->remote->fetch.raw[i]);
>
> -       states->new_refs.strdup_strings = 1;
> -       states->tracked.strdup_strings = 1;
> -       states->stale.strdup_strings = 1;
>         for (ref = fetch_map; ref; ref = ref->next) {
>                 if (!ref->peer_ref || !ref_exists(ref->peer_ref->name))
>                         string_list_append(&states->new_refs, abbrev_branch(ref->name));
> @@ -406,7 +417,6 @@ static int get_push_ref_states(const struct ref *remote_refs,
>
>         match_push_refs(local_refs, &push_map, &remote->push, MATCH_REFS_NONE);
>
> -       states->push.strdup_strings = 1;
>         for (ref = push_map; ref; ref = ref->next) {
>                 struct string_list_item *item;
>                 struct push_info *info;
> @@ -449,7 +459,6 @@ static int get_push_ref_states_noquery(struct ref_states *states)
>         if (remote->mirror)
>                 return 0;
>
> -       states->push.strdup_strings = 1;
>         if (!remote->push.nr) {
>                 item = string_list_append(&states->push, _("(matching)"));
>                 info = item->util = xcalloc(1, sizeof(struct push_info));
> @@ -483,7 +492,6 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
>         refspec.force = 0;
>         refspec.pattern = 1;
>         refspec.src = refspec.dst = "refs/heads/*";
> -       states->heads.strdup_strings = 1;
>         get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
>         matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
>                                     fetch_map, 1);
> @@ -905,7 +913,7 @@ static void clear_push_info(void *util, const char *string)
>  {
>         struct push_info *info = util;
>         free(info->dest);
> -       free(info);
> +       /* note: fixed memleak here */
>  }
>
>  static void free_remote_ref_states(struct ref_states *states)
> @@ -1159,7 +1167,7 @@ static int get_one_entry(struct remote *remote, void *priv)
>                 string_list_append(list, remote->name)->util =
>                                 strbuf_detach(&url_buf, NULL);
>         } else
> -               string_list_append(list, remote->name)->util = NULL;
> +               string_list_append(list, remote->name);
>         if (remote->pushurl_nr) {
>                 url = remote->pushurl;
>                 url_nr = remote->pushurl_nr;
> @@ -1179,10 +1187,9 @@ static int get_one_entry(struct remote *remote, void *priv)
>
>  static int show_all(void)
>  {
> -       struct string_list list = STRING_LIST_INIT_NODUP;
> +       struct string_list list = STRING_LIST_INIT_DUP;
>         int result;
>
> -       list.strdup_strings = 1;
>         result = for_each_remote(get_one_entry, &list);
>
>         if (!result) {
> @@ -1212,7 +1219,7 @@ static int show(int argc, const char **argv)
>                 OPT_BOOL('n', NULL, &no_query, N_("do not query remotes")),
>                 OPT_END()
>         };
> -       struct ref_states states;
> +       struct ref_states states = REF_STATES_INIT;
>         struct string_list info_list = STRING_LIST_INIT_NODUP;
>         struct show_info info;
>
> @@ -1334,8 +1341,7 @@ static int set_head(int argc, const char **argv)
>         if (!opt_a && !opt_d && argc == 2) {
>                 head_name = xstrdup(argv[1]);
>         } else if (opt_a && !opt_d && argc == 1) {
> -               struct ref_states states;
> -               memset(&states, 0, sizeof(states));
> +               struct ref_states states = REF_STATES_INIT;
>                 get_remote_ref_states(argv[0], &states, GET_HEAD_NAMES);
>                 if (!states.heads.nr)
>                         result |= error(_("Cannot determine remote HEAD"));
> @@ -1374,14 +1380,13 @@ static int set_head(int argc, const char **argv)
>  static int prune_remote(const char *remote, int dry_run)
>  {
>         int result = 0;
> -       struct ref_states states;
> +       struct ref_states states = REF_STATES_INIT;
>         struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
>         struct string_list_item *item;
>         const char *dangling_msg = dry_run
>                 ? _(" %s will become dangling!")
>                 : _(" %s has become dangling!");
>
> -       memset(&states, 0, sizeof(states));
>         get_remote_ref_states(remote, &states, GET_REF_STATES);
>
>         if (!states.stale.nr) {

Everything up to here looks like a very nice cleanup.

> diff --git a/merge-ort.c b/merge-ort.c
> index ec0c5904211..53ed78e7a01 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -432,16 +432,6 @@ struct conflict_info {
>         assert((ci) && !(mi)->clean);        \
>  } while (0)
>
> -static void free_strmap_strings(struct strmap *map)
> -{
> -       struct hashmap_iter iter;
> -       struct strmap_entry *entry;
> -
> -       strmap_for_each_entry(map, &iter, entry) {
> -               free((char*)entry->key);
> -       }
> -}
> -
>  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>                                           int reinitialize)
>  {
> @@ -455,13 +445,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>                 reinitialize ? strset_partial_clear : strset_clear;
>
>         /*
> -        * We marked opti->paths with strdup_strings = 0, so that we
> -        * wouldn't have to make another copy of the fullpath created by
> -        * make_traverse_path from setup_path_info().  But, now that we've
> -        * used it and have no other references to these strings, it is time
> -        * to deallocate them.
> +        * We used the the pattern of re-using already allocated
> +        * strings strmap_clear_strings() in make_traverse_path from
> +        * setup_path_info(). Deallocate them.
>          */
> -       free_strmap_strings(&opti->paths);
> +       strmap_clear_strings(&opti->paths, 0);
>         strmap_func(&opti->paths, 1);
>
>         /*

It's not clear to me that strmap should handle the freeing of the keys
at all; maybe it should and strmap_clear_strings() makes sense to
introduce.  However, this change is clearly wrong regardless, for two
reasons: (1) You are double clearing since strmap_func() is also
called afterwards, and (2) you are also ignoring the potential partial
bit since strmap_func might be strmap_partial_clear() rather than
strmap_clear().

> @@ -472,15 +460,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>         strmap_func(&opti->conflicted, 0);
>
>         /*
> -        * opti->paths_to_free is similar to opti->paths; we created it with
> -        * strdup_strings = 0 to avoid making _another_ copy of the fullpath
> -        * but now that we've used it and have no other references to these
> -        * strings, it is time to deallocate them.  We do so by temporarily
> -        * setting strdup_strings to 1.
> +        * opti->paths_to_free is similar to opti->paths; it's memory
> +        * we borrowed and need to free with string_list_clear_strings().
>          */
> -       opti->paths_to_free.strdup_strings = 1;
> -       string_list_clear(&opti->paths_to_free, 0);
> -       opti->paths_to_free.strdup_strings = 0;
> +       string_list_clear_strings(&opti->paths_to_free, 0);

This is very nice.  I really like this new function and API.

>         if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
>                 discard_index(&opti->attr_index);
> @@ -2664,7 +2647,6 @@ static int collect_renames(struct merge_options *opt,
>          * and have no other references to these strings, it is time to
>          * deallocate them.
>          */
> -       free_strmap_strings(&collisions);
>         strmap_clear(&collisions, 1);
>         return clean;
>  }

This hunk is wrong.

> diff --git a/string-list.h b/string-list.h
> index 0d6b4692396..9eeea996888 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -109,6 +109,9 @@ void string_list_init_dup(struct string_list *list);
>   */
>  void string_list_init(struct string_list *list, int strdup_strings);
>
> +void string_list_cmp_init(struct string_list *list, int strdup_strings,
> +                         compare_strings_fn cmp);
> +

Seems unrelated to what you were trying to highlight?

>  /** Callback function type for for_each_string_list */
>  typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
>
> @@ -129,14 +132,66 @@ void filter_string_list(struct string_list *list, int free_util,
>   */
>  void string_list_clear(struct string_list *list, int free_util);
>
> +/**
> + * Free a string list initialized without `strdup_strings = 1`, but
> + * where we also want to free() the strings. You usually want to just
> + * use string_list_clear() after initializing with
> + * `STRING_LIST_INIT_DUP' instead.
> + *
> + * Useful to free e.g. a string list whose strings came from
> + * strbuf_detach() or other memory that we didn't initially allocate
> + * on the heap, but which we now manage.
> + *
> + * Under the hood this is identical in behavior to temporarily setting
> + * `strbuf_strings` to `1` for the duration of this function call, but
> + * without the verbosity of performing that dance yourself.
> + */
> +void string_list_clear_strings(struct string_list *list, int free_util);
> +
> +/**
> + * Clear only the `util` pointer, but not the `string`, even if
> + * `strdup_strings = 1` is set. Useful for the idiom of doing e.g.:
> + *
> + *    string_list_append(&list, str + offs)->util = str;
> + *
> + * Where we add a string at some offset, own the string (so
> + * effectively `strdup_strings = `), but can't free() the string
> + * itself at the changed offset, but need to free the original data in
> + * `util` instead.
> + */
> +void string_list_clear_util(struct string_list *list);
> +
>  /**
>   * Callback type for `string_list_clear_func`.  The string associated
>   * with the util pointer is passed as the second argument
>   */
>  typedef void (*string_list_clear_func_t)(void *p, const char *str);
>
> -/** Call a custom clear function on each util pointer */
> -void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc);
> +/**
> + * Like string_list_clear() except that it first calls a custom clear
> + * function on each util pointer.
> + *
> + * We guarantee that the `clearfunc` will be called on all util
> + * pointers in a list before we proceed to free the first string or
> + * util pointer, i.e. should you need to it's OK to peek at other util
> + * items in the list itself, or to otherwise iterate it from within
> + * the `clearfunc`.
> + *
> + * You do not need to free() the passed-in util pointer itself,
> + * i.e. after calling all `clearfunc` this has the seme behavior as
> + * string_list_clear() called with with `free_util = 1`.
> + */
> +void string_list_clear_func(struct string_list *list,
> +                           string_list_clear_func_t clearfunc);
> +
> +/**
> + * Like string_list_clear_func() but free the strings too, using the
> + * same dance as described for string_list_clear_strings()
> + * above. You'll usually want to initialize with
> + * `STRING_LIST_INIT_DUP` and use string_list_clear_strings() instead.
> + */
> +void string_list_clear_strings_func(struct string_list *list,
> +                                   string_list_clear_func_t clearfunc);
>
>  /**
>   * Apply `func` to each item. If `func` returns nonzero, the

string_list_clear_strings() looks very nice.  The others are probably
good too, though I'm curious about the need for double walking the
list to free it instead of doing it in a single walk; what callers
need to walk the list and check out other values?

> diff --git a/strmap.h b/strmap.h
> index 1e152d832d6..337f6278e86 100644
> --- a/strmap.h
> +++ b/strmap.h
> @@ -51,12 +51,25 @@ void strmap_init_with_options(struct strmap *map,
>   */
>  void strmap_clear(struct strmap *map, int free_values);
>
> +/**
> + * To strmap_clear() what string_list_clear_strings() is to
> + * string_list_clear(). I.e. free your keys too, which we used as-is
> + * without `strdup_strings = 1`.
> + */
> +void strmap_clear_strings(struct strmap *map, int free_values);

strmap.h doesn't depend on string-list.h, so the comment should be
self-standing.  The analogy also doesn't seem to hold since we do NOT
need to free the keys when strdup_strings is 1; Peff suggested
FLEXPTR_ALLOC_STR specifically to avoid that extra allocation in that
case.

> +
>  /*
>   * Similar to strmap_clear() but leaves map->map->table allocated and
>   * pre-sized so that subsequent uses won't need as many rehashings.
>   */
>  void strmap_partial_clear(struct strmap *map, int free_values);
>
> +/**
> + * To strmap_partial_clear() what string_list_clear_strings() is to
> + * string_list_clear(). See strmap_clear_strings() above.
> + */
> +void strmap_partial_clear_strings(struct strmap *map, int free_values);
> +

Same comment as above for strmap_clear_strings() applies here.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-30  2:27         ` Elijah Newren
@ 2021-07-30 16:12           ` Jeff King
  0 siblings, 0 replies; 65+ messages in thread
From: Jeff King @ 2021-07-30 16:12 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Jul 29, 2021 at 08:27:51PM -0600, Elijah Newren wrote:

> > FWIW, I had the same thought. You can also provide a helper to make the
> > freeing side nicer:
> >
> >   static void mem_pool_free(struct mem_pool *m, void *ptr)
> >   {
> >         if (m)
> >                 return; /* will be freed when pool frees */
> >         free(ptr);
> >   }
> >
> > We do something similar with unuse_commit_buffer(), where the caller
> > isn't aware of we pulled the buffer from cache or allocated it
> > especially for them.
> 
> Having a paired function may help one side, but I worry that the name
> (mem_pool_free) might introduce some confusion of its own -- "Why is
> there a mem_pool_free() function, isn't the point of memory pools to
> not need to individually free things?"  Or, "Why are they freeing the
> pool here and what's the extra parameter?"

Yeah, "mem_pool_maybe_free" or something might explain it. But...

> I'm not sure I see the right way to address that, so I think I'm going
> to leave this part out of my series and let someone else add such
> changes on top if they feel motivated to do so.

That's fine, especially as dropping the conditiona USE_MEMORY_POOL flag
means these functions will go away entirely.

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-30  2:30           ` Elijah Newren
@ 2021-07-30 16:12             ` Jeff King
  0 siblings, 0 replies; 65+ messages in thread
From: Jeff King @ 2021-07-30 16:12 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee

On Thu, Jul 29, 2021 at 08:30:23PM -0600, Elijah Newren wrote:

> > What if there was a flags field? Then it could be combined with the
> > free_values parameter. The result is kind of verbose in two ways:
> >
> >  - now strset_clear(), etc, need a "flags" parameter, which they didn't
> >    before (and is just "0" most of the time!)
> >
> >  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
> >    That's a lot longer, though arguably it's easier to understand since
> >    the boolean is explained.
> >
> > Having gone through the exercise, I am not sure it is actually making
> > anything more readable (messy patch is below for reference).
> 
> Thanks for diving in.  Since it's not clear if it's helping, I'll just
> take your earlier suggestion to rename the "strmap_func" variable to
> "strmap_clear_func" instead.

That sounds just fine with me. Thanks for considering my tangent. :)

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools
  2021-07-30 14:36             ` Elijah Newren
@ 2021-07-30 16:23               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 65+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-30 16:23 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List,
	Eric Sunshine, Derrick Stolee


On Fri, Jul 30 2021, Elijah Newren wrote:

> Hi Ævar,
>
> On Fri, Jul 30, 2021 at 7:33 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Thu, Jul 29 2021, Jeff King wrote:
>>
>> > On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
>> >
>> >> > Arguably, the existence of these function indirections is perhaps a sign
>> >> > that the strmap API should provide a version of the clear functions that
>> >> > takes "partial / not-partial" as a parameter.
>> >>
>> >> Are you suggesting a modification of str{map,intmap,set}_clear() to
>> >> take an extra parameter, or removing the
>> >> str{map,intmap,set}_partial_clear() functions and introducing new
>> >> functions that take a partial/not-partial parameter?  I think you're
>> >> suggesting the latter, and that makes more sense to me...but I'm
>> >> drawing blanks trying to come up with a reasonable function name.
>> >
>> > It does seem a shame to add the "partial" parameter to strmap_clear(),
>> > just because most callers don't need it (so they end up with this
>> > inscrutable "0" parameter).
>> >
>> > What if there was a flags field? Then it could be combined with the
>> > free_values parameter. The result is kind of verbose in two ways:
>> >
>> >  - now strset_clear(), etc, need a "flags" parameter, which they didn't
>> >    before (and is just "0" most of the time!)
>> >
>> >  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
>> >    That's a lot longer, though arguably it's easier to understand since
>> >    the boolean is explained.
>> >
>> > Having gone through the exercise, I am not sure it is actually making
>> > anything more readable (messy patch is below for reference).
>>
>> I've got some WIP patches for string-list.h and strmap.h to make the API
>> nicer, and it's probably applicable to strset.h too.
>
> There is no strset.h; strset and strintmap along with strmap are part
> of strmap.h.
>
>> I.e. I found when using strset.h that it was a weird API to use, because
>> unlike string-list.h it didn't pay attention to your "dup" field when
>> freeing, you had to do it explicitly.
>
> Do you mean strmap.h instead of strset.h?

Yes, sorry. Brainfart.

> In general, if you are asking strmap/strset/strintmap to dup your keys
> and are explicitly freeing the strings, then you are misusing the API
> and either freeing pointers that were never allocated or getting
> double frees.  It's wrong to explicitly deallocate them because:
>   * When using a pool, we just allocate from the pool.  The memory
> will be freed when the pool is freed.
>   * When not using a pool, we use FLEXPTR_ALLOC_STR in order to make
> the string be part of the allocated strmap_entry.  The string's memory
> is deallocated when the strmap_entry is.
>
> The only reason to explicitly free keys in a strmap/strset/strintmap
> is if you do NOT have strdup_strings set and allocated the strings
> elsewhere and left your strmap as the only thing tracking the strings.

Yes, sorry. I think I was trying to address that case, i.e. it started
with fixing some memory leaks, but it's part of a branch of mine that's
in a messy WIP state of not passing the tests. Please ignore the rest of
the hunks to do with strmap.h.

I might have some worthwhile fixes there, maybe not. It mainly started
with carrying things over from similar changes in string-list.h.

>> And then in e.g. merge-ort.c there's this "strdup dance" pattern where
>> we flip the field back and forth.
>>
>> The below diff is exctracted from that WIP work, with the relevant two
>> API headers and then two changed API users for show (the tree-wide
>> changes are much larger).
>>
>> I think making the promise I make in the updated docs at "We guarantee
>> that the `clearfunc`[...]" in string-list.h makes for particularly nice
>> API behavior.
>>
>>  builtin/remote.c | 37 ++++++++++++++++++++---------------
>>  merge-ort.c      | 32 +++++++-----------------------
>>  string-list.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  strmap.h         | 13 +++++++++++++
>>  4 files changed, 98 insertions(+), 43 deletions(-)
>>
>> diff --git a/builtin/remote.c b/builtin/remote.c
>> index 7f88e6ce9de..ec1dbd49f71 100644
>> --- a/builtin/remote.c
>> +++ b/builtin/remote.c
>> @@ -340,10 +340,24 @@ static void read_branches(void)
>>
>>  struct ref_states {
>>         struct remote *remote;
>> -       struct string_list new_refs, stale, tracked, heads, push;
>> +
>> +       struct string_list new_refs;
>> +       struct string_list stale;
>> +       struct string_list tracked;
>> +       struct string_list heads;
>> +       struct string_list push;
>> +
>>         int queried;
>>  };
>>
>> +#define REF_STATES_INIT { \
>> +       .new_refs = STRING_LIST_INIT_DUP, \
>> +       .stale = STRING_LIST_INIT_DUP, \
>> +       .tracked = STRING_LIST_INIT_DUP, \
>> +       .heads = STRING_LIST_INIT_DUP, \
>> +       .push = STRING_LIST_INIT_DUP, \
>> +}
>> +
>>  static int get_ref_states(const struct ref *remote_refs, struct ref_states *states)
>>  {
>>         struct ref *fetch_map = NULL, **tail = &fetch_map;
>> @@ -355,9 +369,6 @@ static int get_ref_states(const struct ref *remote_refs, struct ref_states *stat
>>                         die(_("Could not get fetch map for refspec %s"),
>>                                 states->remote->fetch.raw[i]);
>>
>> -       states->new_refs.strdup_strings = 1;
>> -       states->tracked.strdup_strings = 1;
>> -       states->stale.strdup_strings = 1;
>>         for (ref = fetch_map; ref; ref = ref->next) {
>>                 if (!ref->peer_ref || !ref_exists(ref->peer_ref->name))
>>                         string_list_append(&states->new_refs, abbrev_branch(ref->name));
>> @@ -406,7 +417,6 @@ static int get_push_ref_states(const struct ref *remote_refs,
>>
>>         match_push_refs(local_refs, &push_map, &remote->push, MATCH_REFS_NONE);
>>
>> -       states->push.strdup_strings = 1;
>>         for (ref = push_map; ref; ref = ref->next) {
>>                 struct string_list_item *item;
>>                 struct push_info *info;
>> @@ -449,7 +459,6 @@ static int get_push_ref_states_noquery(struct ref_states *states)
>>         if (remote->mirror)
>>                 return 0;
>>
>> -       states->push.strdup_strings = 1;
>>         if (!remote->push.nr) {
>>                 item = string_list_append(&states->push, _("(matching)"));
>>                 info = item->util = xcalloc(1, sizeof(struct push_info));
>> @@ -483,7 +492,6 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
>>         refspec.force = 0;
>>         refspec.pattern = 1;
>>         refspec.src = refspec.dst = "refs/heads/*";
>> -       states->heads.strdup_strings = 1;
>>         get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
>>         matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
>>                                     fetch_map, 1);
>> @@ -905,7 +913,7 @@ static void clear_push_info(void *util, const char *string)
>>  {
>>         struct push_info *info = util;
>>         free(info->dest);
>> -       free(info);
>> +       /* note: fixed memleak here */
>>  }
>>
>>  static void free_remote_ref_states(struct ref_states *states)
>> @@ -1159,7 +1167,7 @@ static int get_one_entry(struct remote *remote, void *priv)
>>                 string_list_append(list, remote->name)->util =
>>                                 strbuf_detach(&url_buf, NULL);
>>         } else
>> -               string_list_append(list, remote->name)->util = NULL;
>> +               string_list_append(list, remote->name);
>>         if (remote->pushurl_nr) {
>>                 url = remote->pushurl;
>>                 url_nr = remote->pushurl_nr;
>> @@ -1179,10 +1187,9 @@ static int get_one_entry(struct remote *remote, void *priv)
>>
>>  static int show_all(void)
>>  {
>> -       struct string_list list = STRING_LIST_INIT_NODUP;
>> +       struct string_list list = STRING_LIST_INIT_DUP;
>>         int result;
>>
>> -       list.strdup_strings = 1;
>>         result = for_each_remote(get_one_entry, &list);
>>
>>         if (!result) {
>> @@ -1212,7 +1219,7 @@ static int show(int argc, const char **argv)
>>                 OPT_BOOL('n', NULL, &no_query, N_("do not query remotes")),
>>                 OPT_END()
>>         };
>> -       struct ref_states states;
>> +       struct ref_states states = REF_STATES_INIT;
>>         struct string_list info_list = STRING_LIST_INIT_NODUP;
>>         struct show_info info;
>>
>> @@ -1334,8 +1341,7 @@ static int set_head(int argc, const char **argv)
>>         if (!opt_a && !opt_d && argc == 2) {
>>                 head_name = xstrdup(argv[1]);
>>         } else if (opt_a && !opt_d && argc == 1) {
>> -               struct ref_states states;
>> -               memset(&states, 0, sizeof(states));
>> +               struct ref_states states = REF_STATES_INIT;
>>                 get_remote_ref_states(argv[0], &states, GET_HEAD_NAMES);
>>                 if (!states.heads.nr)
>>                         result |= error(_("Cannot determine remote HEAD"));
>> @@ -1374,14 +1380,13 @@ static int set_head(int argc, const char **argv)
>>  static int prune_remote(const char *remote, int dry_run)
>>  {
>>         int result = 0;
>> -       struct ref_states states;
>> +       struct ref_states states = REF_STATES_INIT;
>>         struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
>>         struct string_list_item *item;
>>         const char *dangling_msg = dry_run
>>                 ? _(" %s will become dangling!")
>>                 : _(" %s has become dangling!");
>>
>> -       memset(&states, 0, sizeof(states));
>>         get_remote_ref_states(remote, &states, GET_REF_STATES);
>>
>>         if (!states.stale.nr) {
>
> Everything up to here looks like a very nice cleanup.
>
>> diff --git a/merge-ort.c b/merge-ort.c
>> index ec0c5904211..53ed78e7a01 100644
>> --- a/merge-ort.c
>> +++ b/merge-ort.c
>> @@ -432,16 +432,6 @@ struct conflict_info {
>>         assert((ci) && !(mi)->clean);        \
>>  } while (0)
>>
>> -static void free_strmap_strings(struct strmap *map)
>> -{
>> -       struct hashmap_iter iter;
>> -       struct strmap_entry *entry;
>> -
>> -       strmap_for_each_entry(map, &iter, entry) {
>> -               free((char*)entry->key);
>> -       }
>> -}
>> -
>>  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>>                                           int reinitialize)
>>  {
>> @@ -455,13 +445,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>>                 reinitialize ? strset_partial_clear : strset_clear;
>>
>>         /*
>> -        * We marked opti->paths with strdup_strings = 0, so that we
>> -        * wouldn't have to make another copy of the fullpath created by
>> -        * make_traverse_path from setup_path_info().  But, now that we've
>> -        * used it and have no other references to these strings, it is time
>> -        * to deallocate them.
>> +        * We used the the pattern of re-using already allocated
>> +        * strings strmap_clear_strings() in make_traverse_path from
>> +        * setup_path_info(). Deallocate them.
>>          */
>> -       free_strmap_strings(&opti->paths);
>> +       strmap_clear_strings(&opti->paths, 0);
>>         strmap_func(&opti->paths, 1);
>>
>>         /*
>
> It's not clear to me that strmap should handle the freeing of the keys
> at all; maybe it should and strmap_clear_strings() makes sense to
> introduce.  However, this change is clearly wrong regardless, for two
> reasons: (1) You are double clearing since strmap_func() is also
> called afterwards, and (2) you are also ignoring the potential partial
> bit since strmap_func might be strmap_partial_clear() rather than
> strmap_clear().

*Nod*, see above.

>> @@ -472,15 +460,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>>         strmap_func(&opti->conflicted, 0);
>>
>>         /*
>> -        * opti->paths_to_free is similar to opti->paths; we created it with
>> -        * strdup_strings = 0 to avoid making _another_ copy of the fullpath
>> -        * but now that we've used it and have no other references to these
>> -        * strings, it is time to deallocate them.  We do so by temporarily
>> -        * setting strdup_strings to 1.
>> +        * opti->paths_to_free is similar to opti->paths; it's memory
>> +        * we borrowed and need to free with string_list_clear_strings().
>>          */
>> -       opti->paths_to_free.strdup_strings = 1;
>> -       string_list_clear(&opti->paths_to_free, 0);
>> -       opti->paths_to_free.strdup_strings = 0;
>> +       string_list_clear_strings(&opti->paths_to_free, 0);
>
> This is very nice.  I really like this new function and API.
>
>>         if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
>>                 discard_index(&opti->attr_index);
>> @@ -2664,7 +2647,6 @@ static int collect_renames(struct merge_options *opt,
>>          * and have no other references to these strings, it is time to
>>          * deallocate them.
>>          */
>> -       free_strmap_strings(&collisions);
>>         strmap_clear(&collisions, 1);
>>         return clean;
>>  }
>
> This hunk is wrong.

*nod*

>> diff --git a/string-list.h b/string-list.h
>> index 0d6b4692396..9eeea996888 100644
>> --- a/string-list.h
>> +++ b/string-list.h
>> @@ -109,6 +109,9 @@ void string_list_init_dup(struct string_list *list);
>>   */
>>  void string_list_init(struct string_list *list, int strdup_strings);
>>
>> +void string_list_cmp_init(struct string_list *list, int strdup_strings,
>> +                         compare_strings_fn cmp);
>> +
>
> Seems unrelated to what you were trying to highlight?

Yes, sorry. I just extracted all the diff WIP diff I had for
string-list.h, this bit was unrelated.

It's unrelated cleanup of various things that hardcoded all the fields
during their "init", just because they need strcasecmp or whatever
instead of strcmp.

>>  /** Callback function type for for_each_string_list */
>>  typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
>>
>> @@ -129,14 +132,66 @@ void filter_string_list(struct string_list *list, int free_util,
>>   */
>>  void string_list_clear(struct string_list *list, int free_util);
>>
>> +/**
>> + * Free a string list initialized without `strdup_strings = 1`, but
>> + * where we also want to free() the strings. You usually want to just
>> + * use string_list_clear() after initializing with
>> + * `STRING_LIST_INIT_DUP' instead.
>> + *
>> + * Useful to free e.g. a string list whose strings came from
>> + * strbuf_detach() or other memory that we didn't initially allocate
>> + * on the heap, but which we now manage.
>> + *
>> + * Under the hood this is identical in behavior to temporarily setting
>> + * `strbuf_strings` to `1` for the duration of this function call, but
>> + * without the verbosity of performing that dance yourself.
>> + */
>> +void string_list_clear_strings(struct string_list *list, int free_util);
>> +
>> +/**
>> + * Clear only the `util` pointer, but not the `string`, even if
>> + * `strdup_strings = 1` is set. Useful for the idiom of doing e.g.:
>> + *
>> + *    string_list_append(&list, str + offs)->util = str;
>> + *
>> + * Where we add a string at some offset, own the string (so
>> + * effectively `strdup_strings = `), but can't free() the string
>> + * itself at the changed offset, but need to free the original data in
>> + * `util` instead.
>> + */
>> +void string_list_clear_util(struct string_list *list);
>> +
>>  /**
>>   * Callback type for `string_list_clear_func`.  The string associated
>>   * with the util pointer is passed as the second argument
>>   */
>>  typedef void (*string_list_clear_func_t)(void *p, const char *str);
>>
>> -/** Call a custom clear function on each util pointer */
>> -void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc);
>> +/**
>> + * Like string_list_clear() except that it first calls a custom clear
>> + * function on each util pointer.
>> + *
>> + * We guarantee that the `clearfunc` will be called on all util
>> + * pointers in a list before we proceed to free the first string or
>> + * util pointer, i.e. should you need to it's OK to peek at other util
>> + * items in the list itself, or to otherwise iterate it from within
>> + * the `clearfunc`.
>> + *
>> + * You do not need to free() the passed-in util pointer itself,
>> + * i.e. after calling all `clearfunc` this has the seme behavior as
>> + * string_list_clear() called with with `free_util = 1`.
>> + */
>> +void string_list_clear_func(struct string_list *list,
>> +                           string_list_clear_func_t clearfunc);
>> +
>> +/**
>> + * Like string_list_clear_func() but free the strings too, using the
>> + * same dance as described for string_list_clear_strings()
>> + * above. You'll usually want to initialize with
>> + * `STRING_LIST_INIT_DUP` and use string_list_clear_strings() instead.
>> + */
>> +void string_list_clear_strings_func(struct string_list *list,
>> +                                   string_list_clear_func_t clearfunc);
>>
>>  /**
>>   * Apply `func` to each item. If `func` returns nonzero, the
>
> string_list_clear_strings() looks very nice.  The others are probably
> good too, though I'm curious about the need for double walking the
> list to free it instead of doing it in a single walk; what callers
> need to walk the list and check out other values?

I think there were a couple of users that needed that, but maybe I'm
wrong. I think even if there's not clearly defining the callback freeing
semantics makes sense for caller sanity.

We double-walk now in string_list_clear(), this is mostly documenting
and extending current behavior to being able to clear any arbitrary
combination of string/util with an optional cb, regardless of your
strdup_strings state.

>> diff --git a/strmap.h b/strmap.h
>> index 1e152d832d6..337f6278e86 100644
>> --- a/strmap.h
>> +++ b/strmap.h
>> @@ -51,12 +51,25 @@ void strmap_init_with_options(struct strmap *map,
>>   */
>>  void strmap_clear(struct strmap *map, int free_values);
>>
>> +/**
>> + * To strmap_clear() what string_list_clear_strings() is to
>> + * string_list_clear(). I.e. free your keys too, which we used as-is
>> + * without `strdup_strings = 1`.
>> + */
>> +void strmap_clear_strings(struct strmap *map, int free_values);
>
> strmap.h doesn't depend on string-list.h, so the comment should be
> self-standing.  The analogy also doesn't seem to hold since we do NOT
> need to free the keys when strdup_strings is 1; Peff suggested
> FLEXPTR_ALLOC_STR specifically to avoid that extra allocation in that
> case.

*nod*, see above (i.e. maybe all this strmap.h stuff is wrong). FWIW
that comment was meant as a "if you're familiar with X in API A, this Y
in B works similarly".

>> +
>>  /*
>>   * Similar to strmap_clear() but leaves map->map->table allocated and
>>   * pre-sized so that subsequent uses won't need as many rehashings.
>>   */
>>  void strmap_partial_clear(struct strmap *map, int free_values);
>>
>> +/**
>> + * To strmap_partial_clear() what string_list_clear_strings() is to
>> + * string_list_clear(). See strmap_clear_strings() above.
>> + */
>> +void strmap_partial_clear_strings(struct strmap *map, int free_values);
>> +
>
> Same comment as above for strmap_clear_strings() applies here.

*nod*

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools
  2021-07-30 11:47     ` [PATCH v3 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
@ 2021-07-30 16:24       ` Jeff King
  0 siblings, 0 replies; 65+ messages in thread
From: Jeff King @ 2021-07-30 16:24 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee

On Fri, Jul 30, 2021 at 11:47:44AM +0000, Elijah Newren via GitGitGadget wrote:

> -#if USE_MEMORY_POOL
>  	mem_pool_init(&opt->priv->internal_pool, 0);
> -	opt->priv->pool = &opt->priv->internal_pool;
> -#else
> -	opt->priv->pool = NULL;
> -#endif
> -	pool = opt->priv->pool;
> +	pool = opt->priv->pool = &opt->priv->internal_pool;

Since opt->priv->pool always points at internal_pool now, I think we can
simplify much more. Every "if (!opt->priv->pool)" can go away, and in
turn paths_to_free does, too. Those are really the spots I was most
worried about in terms of complexity.

An easy way to find these spots is to get rid of internal_pool, and just
make "pool" the struct. Then the compiler helpfully complains about all
of the places that check the boolean value of a struct. :)

The patch below is from a fairly mechanical conversion I did. All of the
spots were found by the compiler, except the one in use_cached_pairs (it
assigns to a local "pool" pointer, which is always non-NULL, but that's
not necessarily obvious to the compiler).

You might spot further opportunities for cleanup, as somebody who's more
familiar with the allocation patterns (I happened to notice manually
that paths_to_free is not needed anymore, but I don't know if there are
any other subtle bits).

-Peff

 merge-ort.c | 151 +++++++++--------------------------
 1 file changed, 37 insertions(+), 114 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 63829f5cac..441dc4e094 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -303,8 +303,6 @@ struct merge_options_internal {
 	 *   * these keys serve to intern all the path strings, which allows
 	 *     us to do pointer comparison on directory names instead of
 	 *     strcmp; we just have to be careful to use the interned strings.
-	 *     (Technically paths_to_free may track some strings that were
-	 *      removed from froms paths.)
 	 *
 	 * The values of paths:
 	 *   * either a pointer to a merged_info, or a conflict_info struct
@@ -347,18 +345,7 @@ struct merge_options_internal {
 	 * freed together too.  Using a memory pool for these provides a
 	 * nice speedup.
 	 */
-	struct mem_pool internal_pool;
-	struct mem_pool *pool; /* NULL, or pointer to internal_pool */
-
-	/*
-	 * paths_to_free: additional list of strings to free
-	 *
-	 * If keys are removed from "paths", they are added to paths_to_free
-	 * to ensure they are later freed.  We avoid free'ing immediately since
-	 * other places (e.g. conflict_info.pathnames[]) may still be
-	 * referencing these paths.
-	 */
-	struct string_list paths_to_free;
+	struct mem_pool pool;
 
 	/*
 	 * output: special messages and conflict notices for various paths
@@ -537,19 +524,7 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	void (*strset_clear_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
-	if (opti->pool)
-		strmap_clear_func(&opti->paths, 0);
-	else {
-		/*
-		 * We marked opti->paths with strdup_strings = 0, so that
-		 * we wouldn't have to make another copy of the fullpath
-		 * created by make_traverse_path from setup_path_info().
-		 * But, now that we've used it and have no other references
-		 * to these strings, it is time to deallocate them.
-		 */
-		free_strmap_strings(&opti->paths);
-		strmap_clear_func(&opti->paths, 1);
-	}
+	strmap_clear_func(&opti->paths, 0);
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
@@ -558,20 +533,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 */
 	strmap_clear_func(&opti->conflicted, 0);
 
-	if (!opti->pool) {
-		/*
-		 * opti->paths_to_free is similar to opti->paths; we
-		 * created it with strdup_strings = 0 to avoid making
-		 * _another_ copy of the fullpath but now that we've used
-		 * it and have no other references to these strings, it is
-		 * time to deallocate them.  We do so by temporarily
-		 * setting strdup_strings to 1.
-		 */
-		opti->paths_to_free.strdup_strings = 1;
-		string_list_clear(&opti->paths_to_free, 0);
-		opti->paths_to_free.strdup_strings = 0;
-	}
-
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
 
@@ -621,9 +582,7 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
-	mem_pool_discard(&opti->internal_pool, 0);
-	if (!reinitialize)
-		opti->pool = NULL;
+	mem_pool_discard(&opti->pool, 0);
 
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
@@ -846,7 +805,7 @@ static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = mem_pool_calloc(opt->priv->pool, 1,
+	mi = mem_pool_calloc(&opt->priv->pool, 1,
 			     resolved ? sizeof(struct merged_info) :
 					sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
@@ -895,7 +854,7 @@ static void add_pair(struct merge_options *opt,
 		     unsigned dir_rename_mask)
 {
 	struct diff_filespec *one, *two;
-	struct mem_pool *pool = opt->priv->pool;
+	struct mem_pool *pool = &opt->priv->pool;
 	struct rename_info *renames = &opt->priv->renames;
 	int names_idx = is_add ? side : 0;
 
@@ -1141,7 +1100,7 @@ static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = mem_pool_alloc(opt->priv->pool, len + 1);
+	fullpath = mem_pool_alloc(&opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -1396,7 +1355,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 		copy = renames->deferred[side].possible_trivial_merges;
 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
 					    0,
-					    opt->priv->pool,
+					    &opt->priv->pool,
 					    0);
 		strintmap_for_each_entry(&copy, &iter, entry) {
 			const char *path = entry->key;
@@ -2348,19 +2307,15 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 	VERIFY_CI(ci);
 
 	/* Find parent directories missing from opt->priv->paths */
-	if (opt->priv->pool) {
-		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
-		free((char*)new_path);
-		new_path = (char *)cur_path;
-	} else {
-		cur_path = new_path;
-	}
+	cur_path = mem_pool_strdup(&opt->priv->pool, new_path);
+	free((char*)new_path);
+	new_path = (char *)cur_path;
 
 	while (1) {
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = mem_pool_strndup(opt->priv->pool,
+			parent_name = mem_pool_strndup(&opt->priv->pool,
 						       cur_path,
 						       last_slash - cur_path);
 		} else {
@@ -2371,8 +2326,6 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Look it up in opt->priv->paths */
 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
 		if (entry) {
-			if (!opt->priv->pool)
-				free((char*)parent_name);
 			parent_name = entry->key; /* reuse known pointer */
 			break;
 		}
@@ -2399,16 +2352,6 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		parent_name = cur_dir;
 	}
 
-	if (!opt->priv->pool) {
-		/*
-		 * We are removing old_path from opt->priv->paths.
-		 * old_path also will eventually need to be freed, but it
-		 * may still be used by e.g.  ci->pathnames.  So, store it
-		 * in another string-list for now.
-		 */
-		string_list_append(&opt->priv->paths_to_free, old_path);
-	}
-
 	assert(ci->filemask == 2 || ci->filemask == 4);
 	assert(ci->dirmask == 0);
 	strmap_remove(&opt->priv->paths, old_path, 0);
@@ -2442,8 +2385,6 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		new_ci->stages[index].mode = ci->stages[index].mode;
 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
 
-		if (!opt->priv->pool)
-			free(ci);
 		ci = new_ci;
 	}
 
@@ -2859,7 +2800,7 @@ static void use_cached_pairs(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
-	struct mem_pool *pool = opt->priv->pool;
+	struct mem_pool *pool = &opt->priv->pool;
 
 	/*
 	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
@@ -2871,25 +2812,20 @@ static void use_cached_pairs(struct merge_options *opt,
 		const char *new_name = entry->value;
 		if (!new_name)
 			new_name = old_name;
-		if (pool) {
-			/*
-			 * cached_pairs has _copies* of old_name and new_name,
-			 * because it has to persist across merges.  When
-			 *   pool != NULL
-			 * pool_alloc_filespec() will just re-use the existing
-			 * filenames, which will also get re-used by
-			 * opt->priv->paths if they become renames, and then
-			 * get freed at the end of the merge, leaving the copy
-			 * in cached_pairs dangling.  Avoid this by making a
-			 * copy here.
-			 *
-			 * When pool == NULL, pool_alloc_filespec() calls
-			 * alloc_filespec(), which makes a copy; we don't want
-			 * to add another.
-			 */
-			old_name = mem_pool_strdup(pool, old_name);
-			new_name = mem_pool_strdup(pool, new_name);
-		}
+
+		/*
+		 * cached_pairs has _copies* of old_name and new_name,
+		 * because it has to persist across merges.
+		 *
+		 * pool_alloc_filespec() will just re-use the existing
+		 * filenames, which will also get re-used by
+		 * opt->priv->paths if they become renames, and then
+		 * get freed at the end of the merge, leaving the copy
+		 * in cached_pairs dangling.  Avoid this by making a
+		 * copy here.
+		 */
+		old_name = mem_pool_strdup(pool, old_name);
+		new_name = mem_pool_strdup(pool, new_name);
 
 		/* We don't care about oid/mode, only filenames and status */
 		one = pool_alloc_filespec(pool, old_name);
@@ -3002,7 +2938,7 @@ static int detect_regular_renames(struct merge_options *opt,
 	diff_queued_diff = renames->pairs[side_index];
 	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diffcore_rename_extended(&diff_opts,
-				 opt->priv->pool,
+				 &opt->priv->pool,
 				 &renames->relevant_sources[side_index],
 				 &renames->dirs_removed[side_index],
 				 &renames->dir_rename_count[side_index],
@@ -3053,7 +2989,7 @@ static int collect_renames(struct merge_options *opt,
 
 		if (p->status != 'A' && p->status != 'R') {
 			possibly_cache_new_pair(renames, p, side_index, NULL);
-			pool_diff_free_filepair(opt->priv->pool, p);
+			pool_diff_free_filepair(&opt->priv->pool, p);
 			continue;
 		}
 
@@ -3066,7 +3002,7 @@ static int collect_renames(struct merge_options *opt,
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
 		if (p->status != 'R' && !new_path) {
-			pool_diff_free_filepair(opt->priv->pool, p);
+			pool_diff_free_filepair(&opt->priv->pool, p);
 			continue;
 		}
 
@@ -3184,7 +3120,7 @@ static int detect_and_process_renames(struct merge_options *opt,
 		side_pairs = &renames->pairs[s];
 		for (i = 0; i < side_pairs->nr; ++i) {
 			struct diff_filepair *p = side_pairs->queue[i];
-			pool_diff_free_filepair(opt->priv->pool, p);
+			pool_diff_free_filepair(&opt->priv->pool, p);
 		}
 	}
 
@@ -3197,7 +3133,7 @@ static int detect_and_process_renames(struct merge_options *opt,
 	if (combined.nr) {
 		int i;
 		for (i = 0; i < combined.nr; i++)
-			pool_diff_free_filepair(opt->priv->pool,
+			pool_diff_free_filepair(&opt->priv->pool,
 						combined.queue[i]);
 		free(combined.queue);
 	}
@@ -3672,7 +3608,7 @@ static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		new_ci = mem_pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+		new_ci = mem_pool_calloc(&opt->priv->pool, 1, sizeof(*new_ci));
 
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
@@ -3765,7 +3701,7 @@ static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = mem_pool_alloc(opt->priv->pool,
+			new_ci = mem_pool_alloc(&opt->priv->pool,
 						sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
@@ -3835,19 +3771,8 @@ static void process_entry(struct merge_options *opt,
 				b_path = path;
 			strmap_put(&opt->priv->paths, b_path, new_ci);
 
-			if (rename_a && rename_b) {
+			if (rename_a && rename_b)
 				strmap_remove(&opt->priv->paths, path, 0);
-				/*
-				 * We removed path from opt->priv->paths.  path
-				 * will also eventually need to be freed if not
-				 * part of a memory pool...but it may still be
-				 * used by e.g. ci->pathnames.  So, store it in
-				 * another string-list for now in that case.
-				 */
-				if (!opt->priv->pool)
-					string_list_append(&opt->priv->paths_to_free,
-							   path);
-			}
 
 			/*
 			 * Do special handling for b_path since process_entry()
@@ -4454,8 +4379,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of various renames fields */
 	renames = &opt->priv->renames;
-	mem_pool_init(&opt->priv->internal_pool, 0);
-	pool = opt->priv->pool = &opt->priv->internal_pool;
+	mem_pool_init(&opt->priv->pool, 0);
+	pool = &opt->priv->pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
 					    NOT_RELEVANT, pool, 0);
@@ -4492,15 +4417,13 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * Although we initialize opt->priv->paths with strdup_strings=0,
 	 * that's just to avoid making yet another copy of an allocated
 	 * string.  Putting the entry into paths means we are taking
-	 * ownership, so we will later free it.  paths_to_free is similar.
+	 * ownership, so we will later free it.
 	 *
 	 * In contrast, conflicted just has a subset of keys from paths, so
 	 * we don't want to free those (it'd be a duplicate free).
 	 */
 	strmap_init_with_options(&opt->priv->paths, pool, 0);
 	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
-	if (!opt->priv->pool)
-		string_list_init_nodup(&opt->priv->paths_to_free);
 
 	/*
 	 * keys & strbufs in output will sometimes need to outlive "paths",

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 0/9] Final optimization batch (#15): use memory pools
  2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-07-30 11:47     ` [PATCH v3 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
@ 2021-07-31 17:27     ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
                         ` (10 more replies)
  9 siblings, 11 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren

This series textually depends on en/ort-perf-batch-14, but the ideas are
orthogonal to it and orthogonal to previous series. It can be reviewed
independently.

Changes since v1, addressing Eric's feedback:

 * Fixed a comment that became out-of-date in patch 1
 * Swapped commits 2 and 3 so that one can better motivate the other.

Changes since v2, addressing Peff's feedback:

 * Rebased on en/ort-perf-batch-14 (resolving a trivial conflict with the
   new string_list_init_nodup() usage)
 * Added a new preliminary patch renaming str*_func() to str*_clear_func()
 * Added a new final patch that hardcodes that we'll just use memory pools

Changes since v3, as per Peff's feedback:

 * Don't only remove the extra complexity from the USE_MEMORY_POOL #define;
   also remove the original bookkeeping complexity needed to track
   individual frees when not using a memory pool.

=== Basic Optimization idea ===

In this series, I make use of memory pools to get faster allocations and
deallocations for many data structures that tend to all be deallocated at
the same time anyway.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


=== Overall Results across all optimization work ===

This is my final prepared optimization series. It might be worth reviewing
how my optimizations fared overall, comparing the original merge-recursive
timings with three things: how much merge-recursive improved (as a
side-effect of optimizing merge-ort), how much improvement we would have
gotten from a hypothetical infinite parallelization of rename detection, and
what I achieved at the end with merge-ort:

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
no-renames:       18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
no-renames:         1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565


And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3


=== Caveat ===

It may be worth noting, though, that my optimization numbers above for
merge-ort use test-tool fast-rebase. git rebase -s ort on the three
testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
respectively). At this point, any further optimization work should go into
making a faster full-featured rebase by copying the ideas from fast-rebase:
avoid unnecessary process forking, avoid updating the index and working copy
until either the rebase is finished or you hit a conflict (and don't write
rebase metadata to disk until that point either), get rid of the glacially
slow revision walking of the upstream side of history (nuke
can_fast_forward(), make --reapply-cherry-picks the default) or at least
don't revision walk so many times (multiple calls to get_merge_bases in
can_fast_forward() plus a is_linear_history() walk, checking for upstream
cherry-picks, probably more), turn off per-commit hooks that probably should
have never been on anyway, etc.

Elijah Newren (9):
  merge-ort: rename str{map,intmap,set}_func()
  diffcore-rename: use a mem_pool for exact rename detection's hashmap
  merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  merge-ort: set up a memory pool
  merge-ort: switch our strmaps over to using memory pools
  diffcore-rename, merge-ort: add wrapper functions for filepair
    alloc/dealloc
  merge-ort: store filepairs and filespecs in our mem_pool
  merge-ort: reuse path strings in pool_alloc_filespec
  merge-ort: remove compile-time ability to turn off usage of memory
    pools

 diffcore-rename.c |  68 ++++++++++++++---
 diffcore.h        |   3 +
 merge-ort.c       | 188 +++++++++++++++++++++++++---------------------
 3 files changed, 165 insertions(+), 94 deletions(-)


base-commit: 8b09a900a1f1f00d4deb04f567994ae8f1804b5e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-990%2Fnewren%2Fort-perf-batch-15-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-990/newren/ort-perf-batch-15-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/990

Range-diff vs v3:

  1:  e075d985f26 =  1:  e075d985f26 merge-ort: rename str{map,intmap,set}_func()
  2:  8416afa89fb =  2:  8416afa89fb diffcore-rename: use a mem_pool for exact rename detection's hashmap
  3:  2c0b90eaba5 =  3:  2c0b90eaba5 merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  4:  6646f6fd1ca =  4:  6646f6fd1ca merge-ort: set up a memory pool
  5:  7c49aa601d0 =  5:  7c49aa601d0 merge-ort: switch our strmaps over to using memory pools
  6:  08cf2498f96 =  6:  08cf2498f96 diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  7:  4ffa5af8b57 =  7:  4ffa5af8b57 merge-ort: store filepairs and filespecs in our mem_pool
  8:  1556f0443c3 =  8:  1556f0443c3 merge-ort: reuse path strings in pool_alloc_filespec
  9:  de30dbac25e !  9:  f8cd50794e9 merge-ort: remove compile-time ability to turn off usage of memory pools
     @@ Metadata
       ## Commit message ##
          merge-ort: remove compile-time ability to turn off usage of memory pools
      
     -    Simplify code maintenance a bit by removing the ability to toggle
     -    between usage of memory pools and direct allocations.  This allows us to
     -    also remove and simplify some auxiliary functions.
     +    Simplify code maintenance by removing the ability to toggle between
     +    usage of memory pools and direct allocations.  This allows us to also
     +    remove paths_to_free since it was solely about bookkeeping to make sure
     +    we freed the necessary paths, and allows us to remove some auxiliary
     +    functions.
      
     +    Suggested-by: Jeff King <peff@peff.net>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     @@ merge-ort.c
       /*
        * We have many arrays of size 3.  Whenever we have such an array, the
        * indices refer to one of the sides of the three-way merge.  This is so
     +@@ merge-ort.c: struct merge_options_internal {
     + 	 *   * these keys serve to intern all the path strings, which allows
     + 	 *     us to do pointer comparison on directory names instead of
     + 	 *     strcmp; we just have to be careful to use the interned strings.
     +-	 *     (Technically paths_to_free may track some strings that were
     +-	 *      removed from froms paths.)
     + 	 *
     + 	 * The values of paths:
     + 	 *   * either a pointer to a merged_info, or a conflict_info struct
     +@@ merge-ort.c: struct merge_options_internal {
     + 	 * freed together too.  Using a memory pool for these provides a
     + 	 * nice speedup.
     + 	 */
     +-	struct mem_pool internal_pool;
     +-	struct mem_pool *pool; /* NULL, or pointer to internal_pool */
     +-
     +-	/*
     +-	 * paths_to_free: additional list of strings to free
     +-	 *
     +-	 * If keys are removed from "paths", they are added to paths_to_free
     +-	 * to ensure they are later freed.  We avoid free'ing immediately since
     +-	 * other places (e.g. conflict_info.pathnames[]) may still be
     +-	 * referencing these paths.
     +-	 */
     +-	struct string_list paths_to_free;
     ++	struct mem_pool pool;
     + 
     + 	/*
     + 	 * output: special messages and conflict notices for various paths
     +@@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
     + 	void (*strset_clear_func)(struct strset *) =
     + 		reinitialize ? strset_partial_clear : strset_clear;
     + 
     +-	if (opti->pool)
     +-		strmap_clear_func(&opti->paths, 0);
     +-	else {
     +-		/*
     +-		 * We marked opti->paths with strdup_strings = 0, so that
     +-		 * we wouldn't have to make another copy of the fullpath
     +-		 * created by make_traverse_path from setup_path_info().
     +-		 * But, now that we've used it and have no other references
     +-		 * to these strings, it is time to deallocate them.
     +-		 */
     +-		free_strmap_strings(&opti->paths);
     +-		strmap_clear_func(&opti->paths, 1);
     +-	}
     ++	strmap_clear_func(&opti->paths, 0);
     + 
     + 	/*
     + 	 * All keys and values in opti->conflicted are a subset of those in
     +@@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
     + 	 */
     + 	strmap_clear_func(&opti->conflicted, 0);
     + 
     +-	if (!opti->pool) {
     +-		/*
     +-		 * opti->paths_to_free is similar to opti->paths; we
     +-		 * created it with strdup_strings = 0 to avoid making
     +-		 * _another_ copy of the fullpath but now that we've used
     +-		 * it and have no other references to these strings, it is
     +-		 * time to deallocate them.  We do so by temporarily
     +-		 * setting strdup_strings to 1.
     +-		 */
     +-		opti->paths_to_free.strdup_strings = 1;
     +-		string_list_clear(&opti->paths_to_free, 0);
     +-		opti->paths_to_free.strdup_strings = 0;
     +-	}
     +-
     + 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
     + 		discard_index(&opti->attr_index);
     + 
      @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
       		strmap_clear(&opti->output, 0);
       	}
       
      -#if USE_MEMORY_POOL
     - 	mem_pool_discard(&opti->internal_pool, 0);
     - 	if (!reinitialize)
     - 		opti->pool = NULL;
     +-	mem_pool_discard(&opti->internal_pool, 0);
     +-	if (!reinitialize)
     +-		opti->pool = NULL;
      -#endif
     ++	mem_pool_discard(&opti->pool, 0);
       
       	/* Clean out callback_data as well. */
       	FREE_AND_NULL(renames->callback_data);
     @@ merge-ort.c: static void path_msg(struct merge_options *opt,
      -		return alloc_filespec(path);
      -
      -	/* Similar to alloc_filespec, but allocate from pool and reuse path */
     -+	assert(pool != NULL);
       	spec = mem_pool_calloc(pool, 1, sizeof(*spec));
       	spec->path = (char*)path; /* spec won't modify it */
       
     @@ merge-ort.c: static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
      -		return diff_queue(queue, one, two);
      -
      -	/* Same code as diff_queue, except allocate from pool */
     -+	assert(pool != NULL);
       	dp = mem_pool_calloc(pool, 1, sizeof(*dp));
       	dp->one = one;
       	dp->two = two;
     @@ merge-ort.c: static void setup_path_info(struct merge_options *opt,
      -	mi = pool_calloc(opt->priv->pool, 1,
      -			 resolved ? sizeof(struct merged_info) :
      -				    sizeof(struct conflict_info));
     -+	mi = mem_pool_calloc(opt->priv->pool, 1,
     ++	mi = mem_pool_calloc(&opt->priv->pool, 1,
      +			     resolved ? sizeof(struct merged_info) :
      +					sizeof(struct conflict_info));
       	mi->directory_name = current_dir_name;
       	mi->basename_offset = current_dir_name_len;
       	mi->clean = !!resolved;
     +@@ merge-ort.c: static void add_pair(struct merge_options *opt,
     + 		     unsigned dir_rename_mask)
     + {
     + 	struct diff_filespec *one, *two;
     +-	struct mem_pool *pool = opt->priv->pool;
     + 	struct rename_info *renames = &opt->priv->renames;
     + 	int names_idx = is_add ? side : 0;
     + 
     +@@ merge-ort.c: static void add_pair(struct merge_options *opt,
     + 			return;
     + 	}
     + 
     +-	one = pool_alloc_filespec(pool, pathname);
     +-	two = pool_alloc_filespec(pool, pathname);
     ++	one = pool_alloc_filespec(&opt->priv->pool, pathname);
     ++	two = pool_alloc_filespec(&opt->priv->pool, pathname);
     + 	fill_filespec(is_add ? two : one,
     + 		      &names[names_idx].oid, 1, names[names_idx].mode);
     +-	pool_diff_queue(pool, &renames->pairs[side], one, two);
     ++	pool_diff_queue(&opt->priv->pool, &renames->pairs[side], one, two);
     + }
     + 
     + static void collect_rename_info(struct merge_options *opt,
      @@ merge-ort.c: static int collect_merge_info_callback(int n,
       	len = traverse_path_len(info, p->pathlen);
       
       	/* +1 in both of the following lines to include the NUL byte */
      -	fullpath = pool_alloc(opt->priv->pool, len + 1);
     -+	fullpath = mem_pool_alloc(opt->priv->pool, len + 1);
     ++	fullpath = mem_pool_alloc(&opt->priv->pool, len + 1);
       	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
       
       	/*
     +@@ merge-ort.c: static int handle_deferred_entries(struct merge_options *opt,
     + 		copy = renames->deferred[side].possible_trivial_merges;
     + 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
     + 					    0,
     +-					    opt->priv->pool,
     ++					    &opt->priv->pool,
     + 					    0);
     + 		strintmap_for_each_entry(&copy, &iter, entry) {
     + 			const char *path = entry->key;
      @@ merge-ort.c: static void apply_directory_rename_modifications(struct merge_options *opt,
     + 	VERIFY_CI(ci);
     + 
     + 	/* Find parent directories missing from opt->priv->paths */
     +-	if (opt->priv->pool) {
     +-		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
     +-		free((char*)new_path);
     +-		new_path = (char *)cur_path;
     +-	} else {
     +-		cur_path = new_path;
     +-	}
     ++	cur_path = mem_pool_strdup(&opt->priv->pool, new_path);
     ++	free((char*)new_path);
     ++	new_path = (char *)cur_path;
     + 
     + 	while (1) {
       		/* Find the parent directory of cur_path */
       		char *last_slash = strrchr(cur_path, '/');
       		if (last_slash) {
      -			parent_name = pool_strndup(opt->priv->pool,
      -						   cur_path,
      -						   last_slash - cur_path);
     -+			parent_name = mem_pool_strndup(opt->priv->pool,
     ++			parent_name = mem_pool_strndup(&opt->priv->pool,
      +						       cur_path,
      +						       last_slash - cur_path);
       		} else {
       			parent_name = opt->priv->toplevel_dir;
       			break;
     +@@ merge-ort.c: static void apply_directory_rename_modifications(struct merge_options *opt,
     + 		/* Look it up in opt->priv->paths */
     + 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
     + 		if (entry) {
     +-			if (!opt->priv->pool)
     +-				free((char*)parent_name);
     + 			parent_name = entry->key; /* reuse known pointer */
     + 			break;
     + 		}
     +@@ merge-ort.c: static void apply_directory_rename_modifications(struct merge_options *opt,
     + 		parent_name = cur_dir;
     + 	}
     + 
     +-	if (!opt->priv->pool) {
     +-		/*
     +-		 * We are removing old_path from opt->priv->paths.
     +-		 * old_path also will eventually need to be freed, but it
     +-		 * may still be used by e.g.  ci->pathnames.  So, store it
     +-		 * in another string-list for now.
     +-		 */
     +-		string_list_append(&opt->priv->paths_to_free, old_path);
     +-	}
     +-
     + 	assert(ci->filemask == 2 || ci->filemask == 4);
     + 	assert(ci->dirmask == 0);
     + 	strmap_remove(&opt->priv->paths, old_path, 0);
     +@@ merge-ort.c: static void apply_directory_rename_modifications(struct merge_options *opt,
     + 		new_ci->stages[index].mode = ci->stages[index].mode;
     + 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
     + 
     +-		if (!opt->priv->pool)
     +-			free(ci);
     + 		ci = new_ci;
     + 	}
     + 
     +@@ merge-ort.c: static void use_cached_pairs(struct merge_options *opt,
     + {
     + 	struct hashmap_iter iter;
     + 	struct strmap_entry *entry;
     +-	struct mem_pool *pool = opt->priv->pool;
     + 
     + 	/*
     + 	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
     +@@ merge-ort.c: static void use_cached_pairs(struct merge_options *opt,
     + 		const char *new_name = entry->value;
     + 		if (!new_name)
     + 			new_name = old_name;
     +-		if (pool) {
     +-			/*
     +-			 * cached_pairs has _copies* of old_name and new_name,
     +-			 * because it has to persist across merges.  When
     +-			 *   pool != NULL
     +-			 * pool_alloc_filespec() will just re-use the existing
     +-			 * filenames, which will also get re-used by
     +-			 * opt->priv->paths if they become renames, and then
     +-			 * get freed at the end of the merge, leaving the copy
     +-			 * in cached_pairs dangling.  Avoid this by making a
     +-			 * copy here.
     +-			 *
     +-			 * When pool == NULL, pool_alloc_filespec() calls
     +-			 * alloc_filespec(), which makes a copy; we don't want
     +-			 * to add another.
     +-			 */
     +-			old_name = mem_pool_strdup(pool, old_name);
     +-			new_name = mem_pool_strdup(pool, new_name);
     +-		}
     ++
     ++		/*
     ++		 * cached_pairs has *copies* of old_name and new_name,
     ++		 * because it has to persist across merges.  Since
     ++		 * pool_alloc_filespec() will just re-use the existing
     ++		 * filenames, which will also get re-used by
     ++		 * opt->priv->paths if they become renames, and then
     ++		 * get freed at the end of the merge, that would leave
     ++		 * the copy in cached_pairs dangling.  Avoid this by
     ++		 * making a copy here.
     ++		 */
     ++		old_name = mem_pool_strdup(&opt->priv->pool, old_name);
     ++		new_name = mem_pool_strdup(&opt->priv->pool, new_name);
     + 
     + 		/* We don't care about oid/mode, only filenames and status */
     +-		one = pool_alloc_filespec(pool, old_name);
     +-		two = pool_alloc_filespec(pool, new_name);
     +-		pool_diff_queue(pool, pairs, one, two);
     ++		one = pool_alloc_filespec(&opt->priv->pool, old_name);
     ++		two = pool_alloc_filespec(&opt->priv->pool, new_name);
     ++		pool_diff_queue(&opt->priv->pool, pairs, one, two);
     + 		pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
     + 	}
     + }
     +@@ merge-ort.c: static int detect_regular_renames(struct merge_options *opt,
     + 	diff_queued_diff = renames->pairs[side_index];
     + 	trace2_region_enter("diff", "diffcore_rename", opt->repo);
     + 	diffcore_rename_extended(&diff_opts,
     +-				 opt->priv->pool,
     ++				 &opt->priv->pool,
     + 				 &renames->relevant_sources[side_index],
     + 				 &renames->dirs_removed[side_index],
     + 				 &renames->dir_rename_count[side_index],
     +@@ merge-ort.c: static int collect_renames(struct merge_options *opt,
     + 
     + 		if (p->status != 'A' && p->status != 'R') {
     + 			possibly_cache_new_pair(renames, p, side_index, NULL);
     +-			pool_diff_free_filepair(opt->priv->pool, p);
     ++			pool_diff_free_filepair(&opt->priv->pool, p);
     + 			continue;
     + 		}
     + 
     +@@ merge-ort.c: static int collect_renames(struct merge_options *opt,
     + 
     + 		possibly_cache_new_pair(renames, p, side_index, new_path);
     + 		if (p->status != 'R' && !new_path) {
     +-			pool_diff_free_filepair(opt->priv->pool, p);
     ++			pool_diff_free_filepair(&opt->priv->pool, p);
     + 			continue;
     + 		}
     + 
     +@@ merge-ort.c: cleanup:
     + 		side_pairs = &renames->pairs[s];
     + 		for (i = 0; i < side_pairs->nr; ++i) {
     + 			struct diff_filepair *p = side_pairs->queue[i];
     +-			pool_diff_free_filepair(opt->priv->pool, p);
     ++			pool_diff_free_filepair(&opt->priv->pool, p);
     + 		}
     + 	}
     + 
     +@@ merge-ort.c: simple_cleanup:
     + 	if (combined.nr) {
     + 		int i;
     + 		for (i = 0; i < combined.nr; i++)
     +-			pool_diff_free_filepair(opt->priv->pool,
     ++			pool_diff_free_filepair(&opt->priv->pool,
     + 						combined.queue[i]);
     + 		free(combined.queue);
     + 	}
      @@ merge-ort.c: static void process_entry(struct merge_options *opt,
       		 * the directory to remain here, so we need to move this
       		 * path to some new location.
       		 */
      -		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
     -+		new_ci = mem_pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
     ++		new_ci = mem_pool_calloc(&opt->priv->pool, 1, sizeof(*new_ci));
       
       		/* We don't really want new_ci->merged.result copied, but it'll
       		 * be overwritten below so it doesn't matter.  We also don't
     @@ merge-ort.c: static void process_entry(struct merge_options *opt,
       			int rename_a = 0, rename_b = 0;
       
      -			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
     -+			new_ci = mem_pool_alloc(opt->priv->pool,
     ++			new_ci = mem_pool_alloc(&opt->priv->pool,
      +						sizeof(*new_ci));
       
       			if (S_ISREG(a_mode))
       				rename_a = 1;
     +@@ merge-ort.c: static void process_entry(struct merge_options *opt,
     + 				b_path = path;
     + 			strmap_put(&opt->priv->paths, b_path, new_ci);
     + 
     +-			if (rename_a && rename_b) {
     ++			if (rename_a && rename_b)
     + 				strmap_remove(&opt->priv->paths, path, 0);
     +-				/*
     +-				 * We removed path from opt->priv->paths.  path
     +-				 * will also eventually need to be freed if not
     +-				 * part of a memory pool...but it may still be
     +-				 * used by e.g. ci->pathnames.  So, store it in
     +-				 * another string-list for now in that case.
     +-				 */
     +-				if (!opt->priv->pool)
     +-					string_list_append(&opt->priv->paths_to_free,
     +-							   path);
     +-			}
     + 
     + 			/*
     + 			 * Do special handling for b_path since process_entry()
      @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
       
       	/* Initialization of various renames fields */
       	renames = &opt->priv->renames;
      -#if USE_MEMORY_POOL
     - 	mem_pool_init(&opt->priv->internal_pool, 0);
     +-	mem_pool_init(&opt->priv->internal_pool, 0);
      -	opt->priv->pool = &opt->priv->internal_pool;
      -#else
      -	opt->priv->pool = NULL;
      -#endif
      -	pool = opt->priv->pool;
     -+	pool = opt->priv->pool = &opt->priv->internal_pool;
     ++	mem_pool_init(&opt->priv->pool, 0);
     ++	pool = &opt->priv->pool;
       	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
       		strintmap_init_with_options(&renames->dirs_removed[i],
       					    NOT_RELEVANT, pool, 0);
     +@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
     + 	 * Although we initialize opt->priv->paths with strdup_strings=0,
     + 	 * that's just to avoid making yet another copy of an allocated
     + 	 * string.  Putting the entry into paths means we are taking
     +-	 * ownership, so we will later free it.  paths_to_free is similar.
     ++	 * ownership, so we will later free it.
     + 	 *
     + 	 * In contrast, conflicted just has a subset of keys from paths, so
     + 	 * we don't want to free those (it'd be a duplicate free).
     + 	 */
     + 	strmap_init_with_options(&opt->priv->paths, pool, 0);
     + 	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
     +-	if (!opt->priv->pool)
     +-		string_list_init_nodup(&opt->priv->paths_to_free);
     + 
     + 	/*
     + 	 * keys & strbufs in output will sometimes need to outlive "paths",

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v4 1/9] merge-ort: rename str{map,intmap,set}_func()
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

In order to make it clearer that these three variables holding a
function refer to functions that will clear the strmap/strintmap/strset,
rename them to str{map,intmap,set}_clear_func().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index e75b524153e..401a40247a3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -519,11 +519,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 {
 	struct rename_info *renames = &opti->renames;
 	int i;
-	void (*strmap_func)(struct strmap *, int) =
+	void (*strmap_clear_func)(struct strmap *, int) =
 		reinitialize ? strmap_partial_clear : strmap_clear;
-	void (*strintmap_func)(struct strintmap *) =
+	void (*strintmap_clear_func)(struct strintmap *) =
 		reinitialize ? strintmap_partial_clear : strintmap_clear;
-	void (*strset_func)(struct strset *) =
+	void (*strset_clear_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
 	/*
@@ -534,14 +534,14 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 * to deallocate them.
 	 */
 	free_strmap_strings(&opti->paths);
-	strmap_func(&opti->paths, 1);
+	strmap_clear_func(&opti->paths, 1);
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
 	 * opti->paths.  We don't want to deallocate anything twice, so we
 	 * don't free the keys and we pass 0 for free_values.
 	 */
-	strmap_func(&opti->conflicted, 0);
+	strmap_clear_func(&opti->conflicted, 0);
 
 	/*
 	 * opti->paths_to_free is similar to opti->paths; we created it with
@@ -559,24 +559,24 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 
 	/* Free memory used by various renames maps */
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->dirs_removed[i]);
-		strmap_func(&renames->dir_renames[i], 0);
-		strintmap_func(&renames->relevant_sources[i]);
+		strintmap_clear_func(&renames->dirs_removed[i]);
+		strmap_clear_func(&renames->dir_renames[i], 0);
+		strintmap_clear_func(&renames->relevant_sources[i]);
 		if (!reinitialize)
 			assert(renames->cached_pairs_valid_side == 0);
 		if (i != renames->cached_pairs_valid_side &&
 		    -1 != renames->cached_pairs_valid_side) {
-			strset_func(&renames->cached_target_names[i]);
-			strmap_func(&renames->cached_pairs[i], 1);
-			strset_func(&renames->cached_irrelevant[i]);
+			strset_clear_func(&renames->cached_target_names[i]);
+			strmap_clear_func(&renames->cached_pairs[i], 1);
+			strset_clear_func(&renames->cached_irrelevant[i]);
 			partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
 			if (!reinitialize)
 				strmap_clear(&renames->dir_rename_count[i], 1);
 		}
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->deferred[i].possible_trivial_merges);
-		strset_func(&renames->deferred[i].target_dirs);
+		strintmap_clear_func(&renames->deferred[i].possible_trivial_merges);
+		strset_clear_func(&renames->deferred[i].target_dirs);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 	renames->cached_pairs_valid_side = 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Exact rename detection, via insert_file_table(), uses a hashmap to store
files by oid.  Use a mem_pool for the hashmap entries so these can all be
allocated and deallocated together.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      204.2  ms ±  3.0  ms   202.5  ms ±  3.2  ms
    mega-renames:      1.076 s ±  0.015 s     1.072 s ±  0.012 s
    just-one-mega:   364.1  ms ±  7.0  ms   357.3  ms ±  3.9  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 4ef0459cfb5..73d884099eb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -317,10 +317,11 @@ static int find_identical_files(struct hashmap *srcs,
 }
 
 static void insert_file_table(struct repository *r,
+			      struct mem_pool *pool,
 			      struct hashmap *table, int index,
 			      struct diff_filespec *filespec)
 {
-	struct file_similarity *entry = xmalloc(sizeof(*entry));
+	struct file_similarity *entry = mem_pool_alloc(pool, sizeof(*entry));
 
 	entry->index = index;
 	entry->filespec = filespec;
@@ -336,7 +337,8 @@ static void insert_file_table(struct repository *r,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(struct diff_options *options)
+static int find_exact_renames(struct diff_options *options,
+			      struct mem_pool *pool)
 {
 	int i, renames = 0;
 	struct hashmap file_table;
@@ -346,7 +348,7 @@ static int find_exact_renames(struct diff_options *options)
 	 */
 	hashmap_init(&file_table, NULL, NULL, rename_src_nr);
 	for (i = rename_src_nr-1; i >= 0; i--)
-		insert_file_table(options->repo,
+		insert_file_table(options->repo, pool,
 				  &file_table, i,
 				  rename_src[i].p->one);
 
@@ -354,8 +356,8 @@ static int find_exact_renames(struct diff_options *options)
 	for (i = 0; i < rename_dst_nr; i++)
 		renames += find_identical_files(&file_table, i, options);
 
-	/* Free the hash data structure and entries */
-	hashmap_clear_and_free(&file_table, struct file_similarity, entry);
+	/* Free the hash data structure (entries will be freed with the pool) */
+	hashmap_clear(&file_table);
 
 	return renames;
 }
@@ -1341,6 +1343,7 @@ void diffcore_rename_extended(struct diff_options *options,
 	int num_destinations, dst_cnt;
 	int num_sources, want_copies;
 	struct progress *progress = NULL;
+	struct mem_pool local_pool;
 	struct dir_rename_info info;
 	struct diff_populate_filespec_options dpf_options = {
 		.check_binary = 0,
@@ -1409,11 +1412,18 @@ void diffcore_rename_extended(struct diff_options *options,
 		goto cleanup; /* nothing to do */
 
 	trace2_region_enter("diff", "exact renames", options->repo);
+	mem_pool_init(&local_pool, 32*1024);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames(options);
+	rename_count = find_exact_renames(options, &local_pool);
+	/*
+	 * Discard local_pool immediately instead of at "cleanup:" in order
+	 * to reduce maximum memory usage; inexact rename detection uses up
+	 * a fair amount of memory, and mem_pools can too.
+	 */
+	mem_pool_discard(&local_pool, 0);
 	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
                         ` (7 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Make the code more flexible so that it can handle both being run with or
without a memory pool by adding utility functions which will either call
    xmalloc, xcalloc, xstrndup
or
    mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
depending on whether we have a non-NULL memory pool.  A subsequent
commit will make use of these.

(We will actually be dropping these functions soon and just assuming we
always have a memory pool, but the flexibility was very useful during
development of merge-ort so I want to be able to restore it if needed.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 401a40247a3..63f67246d3d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -664,6 +664,30 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
+{
+	if (!pool)
+		return xcalloc(count, size);
+	return mem_pool_calloc(pool, count, size);
+}
+
+MAYBE_UNUSED
+static void *pool_alloc(struct mem_pool *pool, size_t size)
+{
+	if (!pool)
+		return xmalloc(size);
+	return mem_pool_alloc(pool, size);
+}
+
+MAYBE_UNUSED
+static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
+{
+	if (!pool)
+		return xstrndup(str, len);
+	return mem_pool_strndup(pool, str, len);
+}
+
 /* add a string to a strbuf, but converting "/" to "_" */
 static void add_flattened_path(struct strbuf *out, const char *s)
 {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 4/9] merge-ort: set up a memory pool
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-07-31 17:27       ` [PATCH v4 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge-ort has a lot of data structures, and they all tend to be freed
together in clear_or_reinit_internal_opts().  Set up a memory pool to
allow us to make these allocations and deallocations faster.  Future
commits will adjust various callers to make use of this memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 63f67246d3d..3f425436263 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -37,6 +37,8 @@
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
 
+#define USE_MEMORY_POOL 1 /* faster, but obscures memory leak hunting */
+
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
  * indices refer to one of the sides of the three-way merge.  This is so
@@ -339,6 +341,17 @@ struct merge_options_internal {
 	 */
 	struct strmap conflicted;
 
+	/*
+	 * pool: memory pool for fast allocation/deallocation
+	 *
+	 * We allocate room for lots of filenames and auxiliary data
+	 * structures in merge_options_internal, and it tends to all be
+	 * freed together too.  Using a memory pool for these provides a
+	 * nice speedup.
+	 */
+	struct mem_pool internal_pool;
+	struct mem_pool *pool; /* NULL, or pointer to internal_pool */
+
 	/*
 	 * paths_to_free: additional list of strings to free
 	 *
@@ -603,6 +616,12 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
+#if USE_MEMORY_POOL
+	mem_pool_discard(&opti->internal_pool, 0);
+	if (!reinitialize)
+		opti->pool = NULL;
+#endif
+
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
 	renames->callback_data_nr = renames->callback_data_alloc = 0;
@@ -4381,6 +4400,12 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of various renames fields */
 	renames = &opt->priv->renames;
+#if USE_MEMORY_POOL
+	mem_pool_init(&opt->priv->internal_pool, 0);
+	opt->priv->pool = &opt->priv->internal_pool;
+#else
+	opt->priv->pool = NULL;
+#endif
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
 					    NOT_RELEVANT, NULL, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 5/9] merge-ort: switch our strmaps over to using memory pools
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-07-31 17:27       ` [PATCH v4 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
                         ` (5 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

For all the strmaps (including strintmaps and strsets) whose memory is
unconditionally freed as part of clear_or_reinit_internal_opts(), switch
them over to using our new memory pool.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      202.5  ms ±  3.2  ms    198.1 ms ±  2.6 ms
    mega-renames:      1.072 s ±  0.012 s    715.8 ms ±  4.0 ms
    just-one-mega:   357.3  ms ±  3.9  ms    276.8 ms ±  4.2 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 125 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 75 insertions(+), 50 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 3f425436263..99c75690855 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	void (*strset_clear_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
-	/*
-	 * We marked opti->paths with strdup_strings = 0, so that we
-	 * wouldn't have to make another copy of the fullpath created by
-	 * make_traverse_path from setup_path_info().  But, now that we've
-	 * used it and have no other references to these strings, it is time
-	 * to deallocate them.
-	 */
-	free_strmap_strings(&opti->paths);
-	strmap_clear_func(&opti->paths, 1);
+	if (opti->pool)
+		strmap_clear_func(&opti->paths, 0);
+	else {
+		/*
+		 * We marked opti->paths with strdup_strings = 0, so that
+		 * we wouldn't have to make another copy of the fullpath
+		 * created by make_traverse_path from setup_path_info().
+		 * But, now that we've used it and have no other references
+		 * to these strings, it is time to deallocate them.
+		 */
+		free_strmap_strings(&opti->paths);
+		strmap_clear_func(&opti->paths, 1);
+	}
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
@@ -556,16 +560,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 */
 	strmap_clear_func(&opti->conflicted, 0);
 
-	/*
-	 * opti->paths_to_free is similar to opti->paths; we created it with
-	 * strdup_strings = 0 to avoid making _another_ copy of the fullpath
-	 * but now that we've used it and have no other references to these
-	 * strings, it is time to deallocate them.  We do so by temporarily
-	 * setting strdup_strings to 1.
-	 */
-	opti->paths_to_free.strdup_strings = 1;
-	string_list_clear(&opti->paths_to_free, 0);
-	opti->paths_to_free.strdup_strings = 0;
+	if (!opti->pool) {
+		/*
+		 * opti->paths_to_free is similar to opti->paths; we
+		 * created it with strdup_strings = 0 to avoid making
+		 * _another_ copy of the fullpath but now that we've used
+		 * it and have no other references to these strings, it is
+		 * time to deallocate them.  We do so by temporarily
+		 * setting strdup_strings to 1.
+		 */
+		opti->paths_to_free.strdup_strings = 1;
+		string_list_clear(&opti->paths_to_free, 0);
+		opti->paths_to_free.strdup_strings = 0;
+	}
 
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
@@ -683,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
@@ -691,7 +697,6 @@ static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 	return mem_pool_calloc(pool, count, size);
 }
 
-MAYBE_UNUSED
 static void *pool_alloc(struct mem_pool *pool, size_t size)
 {
 	if (!pool)
@@ -699,7 +704,6 @@ static void *pool_alloc(struct mem_pool *pool, size_t size)
 	return mem_pool_alloc(pool, size);
 }
 
-MAYBE_UNUSED
 static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
 {
 	if (!pool)
@@ -835,8 +839,9 @@ static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
-				   sizeof(struct conflict_info));
+	mi = pool_calloc(opt->priv->pool, 1,
+			 resolved ? sizeof(struct merged_info) :
+				    sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
 	mi->basename_offset = current_dir_name_len;
 	mi->clean = !!resolved;
@@ -1128,7 +1133,7 @@ static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = xmalloc(len + 1);
+	fullpath = pool_alloc(opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -1383,7 +1388,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 		copy = renames->deferred[side].possible_trivial_merges;
 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
 					    0,
-					    NULL,
+					    opt->priv->pool,
 					    0);
 		strintmap_for_each_entry(&copy, &iter, entry) {
 			const char *path = entry->key;
@@ -2335,12 +2340,21 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 	VERIFY_CI(ci);
 
 	/* Find parent directories missing from opt->priv->paths */
-	cur_path = new_path;
+	if (opt->priv->pool) {
+		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
+		free((char*)new_path);
+		new_path = (char *)cur_path;
+	} else {
+		cur_path = new_path;
+	}
+
 	while (1) {
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = xstrndup(cur_path, last_slash - cur_path);
+			parent_name = pool_strndup(opt->priv->pool,
+						   cur_path,
+						   last_slash - cur_path);
 		} else {
 			parent_name = opt->priv->toplevel_dir;
 			break;
@@ -2349,7 +2363,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Look it up in opt->priv->paths */
 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
 		if (entry) {
-			free((char*)parent_name);
+			if (!opt->priv->pool)
+				free((char*)parent_name);
 			parent_name = entry->key; /* reuse known pointer */
 			break;
 		}
@@ -2376,12 +2391,15 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		parent_name = cur_dir;
 	}
 
-	/*
-	 * We are removing old_path from opt->priv->paths.  old_path also will
-	 * eventually need to be freed, but it may still be used by e.g.
-	 * ci->pathnames.  So, store it in another string-list for now.
-	 */
-	string_list_append(&opt->priv->paths_to_free, old_path);
+	if (!opt->priv->pool) {
+		/*
+		 * We are removing old_path from opt->priv->paths.
+		 * old_path also will eventually need to be freed, but it
+		 * may still be used by e.g.  ci->pathnames.  So, store it
+		 * in another string-list for now.
+		 */
+		string_list_append(&opt->priv->paths_to_free, old_path);
+	}
 
 	assert(ci->filemask == 2 || ci->filemask == 4);
 	assert(ci->dirmask == 0);
@@ -2416,7 +2434,8 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		new_ci->stages[index].mode = ci->stages[index].mode;
 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
 
-		free(ci);
+		if (!opt->priv->pool)
+			free(ci);
 		ci = new_ci;
 	}
 
@@ -3623,7 +3642,8 @@ static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		CALLOC_ARRAY(new_ci, 1);
+		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
 		 * want any directory mode/oid values copied, but we'll zero
@@ -3715,7 +3735,7 @@ static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = xmalloc(sizeof(*new_ci));
+			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
 				rename_a = 1;
@@ -3788,12 +3808,14 @@ static void process_entry(struct merge_options *opt,
 				strmap_remove(&opt->priv->paths, path, 0);
 				/*
 				 * We removed path from opt->priv->paths.  path
-				 * will also eventually need to be freed, but
-				 * it may still be used by e.g.  ci->pathnames.
-				 * So, store it in another string-list for now.
+				 * will also eventually need to be freed if not
+				 * part of a memory pool...but it may still be
+				 * used by e.g. ci->pathnames.  So, store it in
+				 * another string-list for now in that case.
 				 */
-				string_list_append(&opt->priv->paths_to_free,
-						   path);
+				if (!opt->priv->pool)
+					string_list_append(&opt->priv->paths_to_free,
+							   path);
 			}
 
 			/*
@@ -4335,6 +4357,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
 	struct rename_info *renames;
 	int i;
+	struct mem_pool *pool = NULL;
 
 	/* Sanity checks on opt */
 	trace2_region_enter("merge", "sanity checks", opt->repo);
@@ -4406,9 +4429,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 #else
 	opt->priv->pool = NULL;
 #endif
+	pool = opt->priv->pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
-					    NOT_RELEVANT, NULL, 0);
+					    NOT_RELEVANT, pool, 0);
 		strmap_init_with_options(&renames->dir_rename_count[i],
 					 NULL, 1);
 		strmap_init_with_options(&renames->dir_renames[i],
@@ -4422,7 +4446,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 		 */
 		strintmap_init_with_options(&renames->relevant_sources[i],
 					    -1 /* explicitly invalid */,
-					    NULL, 0);
+					    pool, 0);
 		strmap_init_with_options(&renames->cached_pairs[i],
 					 NULL, 1);
 		strset_init_with_options(&renames->cached_irrelevant[i],
@@ -4432,9 +4456,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->deferred[i].possible_trivial_merges,
-					    0, NULL, 0);
+					    0, pool, 0);
 		strset_init_with_options(&renames->deferred[i].target_dirs,
-					 NULL, 1);
+					 pool, 1);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 
@@ -4447,9 +4471,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * In contrast, conflicted just has a subset of keys from paths, so
 	 * we don't want to free those (it'd be a duplicate free).
 	 */
-	strmap_init_with_options(&opt->priv->paths, NULL, 0);
-	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
-	string_list_init_nodup(&opt->priv->paths_to_free);
+	strmap_init_with_options(&opt->priv->paths, pool, 0);
+	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
+	if (!opt->priv->pool)
+		string_list_init_nodup(&opt->priv->paths_to_free);
 
 	/*
 	 * keys & strbufs in output will sometimes need to outlive "paths",
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (4 preceding siblings ...)
  2021-07-31 17:27       ` [PATCH v4 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
                         ` (4 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

We want to be able to allocate filespecs and filepairs using a mem_pool.
However, filespec data will still remain outside the pool (perhaps in
the future we could plumb the pool through the various diff APIs to
allocate the filespec data too, but for now we are limiting the scope).
Add some extra functions to allocate these appropriately based on the
non-NULL-ness of opt->priv->pool, as well as some extra functions to
handle correctly deallocating the relevant parts of them.  A future
commit will make use of these new functions.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 41 +++++++++++++++++++++++++++++++++++++++++
 diffcore.h        |  2 ++
 merge-ort.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 73d884099eb..5bc559f79e9 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1328,6 +1328,47 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
 	rename_src_nr = new_num_src;
 }
 
+static void free_filespec_data(struct diff_filespec *spec)
+{
+	if (!--spec->count)
+		diff_free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+static void pool_free_filespec(struct mem_pool *pool,
+			       struct diff_filespec *spec)
+{
+	if (!pool) {
+		free_filespec(spec);
+		return;
+	}
+
+	/*
+	 * Similar to free_filespec(), but only frees the data.  The spec
+	 * itself was allocated in the pool and should not be individually
+	 * freed.
+	 */
+	free_filespec_data(spec);
+}
+
+MAYBE_UNUSED
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p)
+{
+	if (!pool) {
+		diff_free_filepair(p);
+		return;
+	}
+
+	/*
+	 * Similar to diff_free_filepair() but only frees the data from the
+	 * filespecs; not the filespecs or the filepair which were
+	 * allocated from the pool.
+	 */
+	free_filespec_data(p->one);
+	free_filespec_data(p->two);
+}
+
 void diffcore_rename_extended(struct diff_options *options,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
diff --git a/diffcore.h b/diffcore.h
index 533b30e21e7..b58ee6b1934 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -127,6 +127,8 @@ struct diff_filepair {
 #define DIFF_PAIR_MODE_CHANGED(p) ((p)->one->mode != (p)->two->mode)
 
 void diff_free_filepair(struct diff_filepair *);
+void pool_diff_free_filepair(struct mem_pool *pool,
+			     struct diff_filepair *p);
 
 int diff_unmodified_pair(struct diff_filepair *);
 
diff --git a/merge-ort.c b/merge-ort.c
index 99c75690855..e79830f9181 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,6 +690,48 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
+MAYBE_UNUSED
+static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
+						 const char *path)
+{
+	struct diff_filespec *spec;
+	size_t len;
+
+	if (!pool)
+		return alloc_filespec(path);
+
+	/* Same code as alloc_filespec, except allocate from pool */
+	len = strlen(path);
+
+	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
+	memcpy(spec+1, path, len);
+	spec->path = (void*)(spec+1);
+
+	spec->count = 1;
+	spec->is_binary = -1;
+	return spec;
+}
+
+MAYBE_UNUSED
+static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
+					     struct diff_queue_struct *queue,
+					     struct diff_filespec *one,
+					     struct diff_filespec *two)
+{
+	struct diff_filepair *dp;
+
+	if (!pool)
+		return diff_queue(queue, one, two);
+
+	/* Same code as diff_queue, except allocate from pool */
+	dp = mem_pool_calloc(pool, 1, sizeof(*dp));
+	dp->one = one;
+	dp->two = two;
+	if (queue)
+		diff_q(queue, dp);
+	return dp;
+}
+
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 7/9] merge-ort: store filepairs and filespecs in our mem_pool
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (5 preceding siblings ...)
  2021-07-31 17:27       ` [PATCH v4 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
                         ` (3 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.1 ms ±  2.6 ms     198.5 ms ±  3.4 ms
    mega-renames:     715.8 ms ±  4.0 ms     679.1 ms ±  5.6 ms
    just-one-mega:    276.8 ms ±  4.2 ms     271.9 ms ±  2.8 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  9 ++++-----
 diffcore.h        |  1 +
 merge-ort.c       | 26 ++++++++++++++------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 5bc559f79e9..7e6b3e1b143 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1334,7 +1334,6 @@ static void free_filespec_data(struct diff_filespec *spec)
 		diff_free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 static void pool_free_filespec(struct mem_pool *pool,
 			       struct diff_filespec *spec)
 {
@@ -1351,7 +1350,6 @@ static void pool_free_filespec(struct mem_pool *pool,
 	free_filespec_data(spec);
 }
 
-MAYBE_UNUSED
 void pool_diff_free_filepair(struct mem_pool *pool,
 			     struct diff_filepair *p)
 {
@@ -1370,6 +1368,7 @@ void pool_diff_free_filepair(struct mem_pool *pool,
 }
 
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
@@ -1683,7 +1682,7 @@ void diffcore_rename_extended(struct diff_options *options,
 			pair_to_free = p;
 
 		if (pair_to_free)
-			diff_free_filepair(pair_to_free);
+			pool_diff_free_filepair(pool, pair_to_free);
 	}
 	diff_debug_queue("done copying original", &outq);
 
@@ -1693,7 +1692,7 @@ void diffcore_rename_extended(struct diff_options *options,
 
 	for (i = 0; i < rename_dst_nr; i++)
 		if (rename_dst[i].filespec_to_free)
-			free_filespec(rename_dst[i].filespec_to_free);
+			pool_free_filespec(pool, rename_dst[i].filespec_to_free);
 
 	cleanup_dir_rename_info(&info, dirs_removed, dir_rename_count != NULL);
 	FREE_AND_NULL(rename_dst);
@@ -1710,5 +1709,5 @@ void diffcore_rename_extended(struct diff_options *options,
 
 void diffcore_rename(struct diff_options *options)
 {
-	diffcore_rename_extended(options, NULL, NULL, NULL, NULL);
+	diffcore_rename_extended(options, NULL, NULL, NULL, NULL, NULL);
 }
diff --git a/diffcore.h b/diffcore.h
index b58ee6b1934..badc2261c20 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -181,6 +181,7 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count);
 void diffcore_break(struct repository *, int);
 void diffcore_rename(struct diff_options *);
 void diffcore_rename_extended(struct diff_options *options,
+			      struct mem_pool *pool,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
 			      struct strmap *dir_rename_count,
diff --git a/merge-ort.c b/merge-ort.c
index e79830f9181..f4f0a3d57f0 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -690,7 +690,6 @@ static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
@@ -712,7 +711,6 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 	return spec;
 }
 
-MAYBE_UNUSED
 static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 					     struct diff_queue_struct *queue,
 					     struct diff_filespec *one,
@@ -930,6 +928,7 @@ static void add_pair(struct merge_options *opt,
 		     unsigned dir_rename_mask)
 {
 	struct diff_filespec *one, *two;
+	struct mem_pool *pool = opt->priv->pool;
 	struct rename_info *renames = &opt->priv->renames;
 	int names_idx = is_add ? side : 0;
 
@@ -980,11 +979,11 @@ static void add_pair(struct merge_options *opt,
 			return;
 	}
 
-	one = alloc_filespec(pathname);
-	two = alloc_filespec(pathname);
+	one = pool_alloc_filespec(pool, pathname);
+	two = pool_alloc_filespec(pool, pathname);
 	fill_filespec(is_add ? two : one,
 		      &names[names_idx].oid, 1, names[names_idx].mode);
-	diff_queue(&renames->pairs[side], one, two);
+	pool_diff_queue(pool, &renames->pairs[side], one, two);
 }
 
 static void collect_rename_info(struct merge_options *opt,
@@ -2893,6 +2892,7 @@ static void use_cached_pairs(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
+	struct mem_pool *pool = opt->priv->pool;
 
 	/*
 	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
@@ -2906,9 +2906,9 @@ static void use_cached_pairs(struct merge_options *opt,
 			new_name = old_name;
 
 		/* We don't care about oid/mode, only filenames and status */
-		one = alloc_filespec(old_name);
-		two = alloc_filespec(new_name);
-		diff_queue(pairs, one, two);
+		one = pool_alloc_filespec(pool, old_name);
+		two = pool_alloc_filespec(pool, new_name);
+		pool_diff_queue(pool, pairs, one, two);
 		pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
 	}
 }
@@ -3016,6 +3016,7 @@ static int detect_regular_renames(struct merge_options *opt,
 	diff_queued_diff = renames->pairs[side_index];
 	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diffcore_rename_extended(&diff_opts,
+				 opt->priv->pool,
 				 &renames->relevant_sources[side_index],
 				 &renames->dirs_removed[side_index],
 				 &renames->dir_rename_count[side_index],
@@ -3066,7 +3067,7 @@ static int collect_renames(struct merge_options *opt,
 
 		if (p->status != 'A' && p->status != 'R') {
 			possibly_cache_new_pair(renames, p, side_index, NULL);
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3079,7 +3080,7 @@ static int collect_renames(struct merge_options *opt,
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
 		if (p->status != 'R' && !new_path) {
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 			continue;
 		}
 
@@ -3197,7 +3198,7 @@ cleanup:
 		side_pairs = &renames->pairs[s];
 		for (i = 0; i < side_pairs->nr; ++i) {
 			struct diff_filepair *p = side_pairs->queue[i];
-			diff_free_filepair(p);
+			pool_diff_free_filepair(opt->priv->pool, p);
 		}
 	}
 
@@ -3210,7 +3211,8 @@ simple_cleanup:
 	if (combined.nr) {
 		int i;
 		for (i = 0; i < combined.nr; i++)
-			diff_free_filepair(combined.queue[i]);
+			pool_diff_free_filepair(opt->priv->pool,
+						combined.queue[i]);
 		free(combined.queue);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 8/9] merge-ort: reuse path strings in pool_alloc_filespec
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (6 preceding siblings ...)
  2021-07-31 17:27       ` [PATCH v4 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-07-31 17:27       ` [PATCH v4 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
                         ` (2 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

pool_alloc_filespec() was written so that the code when pool != NULL
mimicked the code from alloc_filespec(), which including allocating
enough extra space for the path and then copying it.  However, the path
passed to pool_alloc_filespec() is always going to already be in the
same memory pool, so we may as well reuse it instead of copying it.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.5 ms ±  3.4 ms     198.3 ms ±  2.9 ms
    mega-renames:     679.1 ms ±  5.6 ms     661.8 ms ±  5.9 ms
    just-one-mega:    271.9 ms ±  2.8 ms     264.6 ms ±  2.5 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index f4f0a3d57f0..86ab8f60121 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -694,17 +694,13 @@ static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
 	struct diff_filespec *spec;
-	size_t len;
 
 	if (!pool)
 		return alloc_filespec(path);
 
-	/* Same code as alloc_filespec, except allocate from pool */
-	len = strlen(path);
-
-	spec = mem_pool_calloc(pool, 1, st_add3(sizeof(*spec), len, 1));
-	memcpy(spec+1, path, len);
-	spec->path = (void*)(spec+1);
+	/* Similar to alloc_filespec, but allocate from pool and reuse path */
+	spec = mem_pool_calloc(pool, 1, sizeof(*spec));
+	spec->path = (char*)path; /* spec won't modify it */
 
 	spec->count = 1;
 	spec->is_binary = -1;
@@ -2904,6 +2900,25 @@ static void use_cached_pairs(struct merge_options *opt,
 		const char *new_name = entry->value;
 		if (!new_name)
 			new_name = old_name;
+		if (pool) {
+			/*
+			 * cached_pairs has _copies* of old_name and new_name,
+			 * because it has to persist across merges.  When
+			 *   pool != NULL
+			 * pool_alloc_filespec() will just re-use the existing
+			 * filenames, which will also get re-used by
+			 * opt->priv->paths if they become renames, and then
+			 * get freed at the end of the merge, leaving the copy
+			 * in cached_pairs dangling.  Avoid this by making a
+			 * copy here.
+			 *
+			 * When pool == NULL, pool_alloc_filespec() calls
+			 * alloc_filespec(), which makes a copy; we don't want
+			 * to add another.
+			 */
+			old_name = mem_pool_strdup(pool, old_name);
+			new_name = mem_pool_strdup(pool, new_name);
+		}
 
 		/* We don't care about oid/mode, only filenames and status */
 		one = pool_alloc_filespec(pool, old_name);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (7 preceding siblings ...)
  2021-07-31 17:27       ` [PATCH v4 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
@ 2021-07-31 17:27       ` Elijah Newren via GitGitGadget
  2021-08-02 15:27       ` [PATCH v4 0/9] Final optimization batch (#15): use " Derrick Stolee
  2021-08-03 15:45       ` Jeff King
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-31 17:27 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Simplify code maintenance by removing the ability to toggle between
usage of memory pools and direct allocations.  This allows us to also
remove paths_to_free since it was solely about bookkeeping to make sure
we freed the necessary paths, and allows us to remove some auxiliary
functions.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 209 ++++++++++++----------------------------------------
 1 file changed, 47 insertions(+), 162 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 86ab8f60121..88ade50f4ed 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -37,8 +37,6 @@
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
 
-#define USE_MEMORY_POOL 1 /* faster, but obscures memory leak hunting */
-
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
  * indices refer to one of the sides of the three-way merge.  This is so
@@ -305,8 +303,6 @@ struct merge_options_internal {
 	 *   * these keys serve to intern all the path strings, which allows
 	 *     us to do pointer comparison on directory names instead of
 	 *     strcmp; we just have to be careful to use the interned strings.
-	 *     (Technically paths_to_free may track some strings that were
-	 *      removed from froms paths.)
 	 *
 	 * The values of paths:
 	 *   * either a pointer to a merged_info, or a conflict_info struct
@@ -349,18 +345,7 @@ struct merge_options_internal {
 	 * freed together too.  Using a memory pool for these provides a
 	 * nice speedup.
 	 */
-	struct mem_pool internal_pool;
-	struct mem_pool *pool; /* NULL, or pointer to internal_pool */
-
-	/*
-	 * paths_to_free: additional list of strings to free
-	 *
-	 * If keys are removed from "paths", they are added to paths_to_free
-	 * to ensure they are later freed.  We avoid free'ing immediately since
-	 * other places (e.g. conflict_info.pathnames[]) may still be
-	 * referencing these paths.
-	 */
-	struct string_list paths_to_free;
+	struct mem_pool pool;
 
 	/*
 	 * output: special messages and conflict notices for various paths
@@ -539,19 +524,7 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	void (*strset_clear_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
-	if (opti->pool)
-		strmap_clear_func(&opti->paths, 0);
-	else {
-		/*
-		 * We marked opti->paths with strdup_strings = 0, so that
-		 * we wouldn't have to make another copy of the fullpath
-		 * created by make_traverse_path from setup_path_info().
-		 * But, now that we've used it and have no other references
-		 * to these strings, it is time to deallocate them.
-		 */
-		free_strmap_strings(&opti->paths);
-		strmap_clear_func(&opti->paths, 1);
-	}
+	strmap_clear_func(&opti->paths, 0);
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
@@ -560,20 +533,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 */
 	strmap_clear_func(&opti->conflicted, 0);
 
-	if (!opti->pool) {
-		/*
-		 * opti->paths_to_free is similar to opti->paths; we
-		 * created it with strdup_strings = 0 to avoid making
-		 * _another_ copy of the fullpath but now that we've used
-		 * it and have no other references to these strings, it is
-		 * time to deallocate them.  We do so by temporarily
-		 * setting strdup_strings to 1.
-		 */
-		opti->paths_to_free.strdup_strings = 1;
-		string_list_clear(&opti->paths_to_free, 0);
-		opti->paths_to_free.strdup_strings = 0;
-	}
-
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
 
@@ -623,11 +582,7 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
-#if USE_MEMORY_POOL
-	mem_pool_discard(&opti->internal_pool, 0);
-	if (!reinitialize)
-		opti->pool = NULL;
-#endif
+	mem_pool_discard(&opti->pool, 0);
 
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
@@ -693,12 +648,9 @@ static void path_msg(struct merge_options *opt,
 static struct diff_filespec *pool_alloc_filespec(struct mem_pool *pool,
 						 const char *path)
 {
+	/* Similar to alloc_filespec(), but allocate from pool and reuse path */
 	struct diff_filespec *spec;
 
-	if (!pool)
-		return alloc_filespec(path);
-
-	/* Similar to alloc_filespec, but allocate from pool and reuse path */
 	spec = mem_pool_calloc(pool, 1, sizeof(*spec));
 	spec->path = (char*)path; /* spec won't modify it */
 
@@ -712,12 +664,9 @@ static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 					     struct diff_filespec *one,
 					     struct diff_filespec *two)
 {
+	/* Same code as diff_queue(), except allocate from pool */
 	struct diff_filepair *dp;
 
-	if (!pool)
-		return diff_queue(queue, one, two);
-
-	/* Same code as diff_queue, except allocate from pool */
 	dp = mem_pool_calloc(pool, 1, sizeof(*dp));
 	dp->one = one;
 	dp->two = two;
@@ -726,27 +675,6 @@ static struct diff_filepair *pool_diff_queue(struct mem_pool *pool,
 	return dp;
 }
 
-static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
-{
-	if (!pool)
-		return xcalloc(count, size);
-	return mem_pool_calloc(pool, count, size);
-}
-
-static void *pool_alloc(struct mem_pool *pool, size_t size)
-{
-	if (!pool)
-		return xmalloc(size);
-	return mem_pool_alloc(pool, size);
-}
-
-static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
-{
-	if (!pool)
-		return xstrndup(str, len);
-	return mem_pool_strndup(pool, str, len);
-}
-
 /* add a string to a strbuf, but converting "/" to "_" */
 static void add_flattened_path(struct strbuf *out, const char *s)
 {
@@ -875,9 +803,9 @@ static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = pool_calloc(opt->priv->pool, 1,
-			 resolved ? sizeof(struct merged_info) :
-				    sizeof(struct conflict_info));
+	mi = mem_pool_calloc(&opt->priv->pool, 1,
+			     resolved ? sizeof(struct merged_info) :
+					sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
 	mi->basename_offset = current_dir_name_len;
 	mi->clean = !!resolved;
@@ -924,7 +852,6 @@ static void add_pair(struct merge_options *opt,
 		     unsigned dir_rename_mask)
 {
 	struct diff_filespec *one, *two;
-	struct mem_pool *pool = opt->priv->pool;
 	struct rename_info *renames = &opt->priv->renames;
 	int names_idx = is_add ? side : 0;
 
@@ -975,11 +902,11 @@ static void add_pair(struct merge_options *opt,
 			return;
 	}
 
-	one = pool_alloc_filespec(pool, pathname);
-	two = pool_alloc_filespec(pool, pathname);
+	one = pool_alloc_filespec(&opt->priv->pool, pathname);
+	two = pool_alloc_filespec(&opt->priv->pool, pathname);
 	fill_filespec(is_add ? two : one,
 		      &names[names_idx].oid, 1, names[names_idx].mode);
-	pool_diff_queue(pool, &renames->pairs[side], one, two);
+	pool_diff_queue(&opt->priv->pool, &renames->pairs[side], one, two);
 }
 
 static void collect_rename_info(struct merge_options *opt,
@@ -1170,7 +1097,7 @@ static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = pool_alloc(opt->priv->pool, len + 1);
+	fullpath = mem_pool_alloc(&opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -1425,7 +1352,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 		copy = renames->deferred[side].possible_trivial_merges;
 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
 					    0,
-					    opt->priv->pool,
+					    &opt->priv->pool,
 					    0);
 		strintmap_for_each_entry(&copy, &iter, entry) {
 			const char *path = entry->key;
@@ -2377,21 +2304,17 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 	VERIFY_CI(ci);
 
 	/* Find parent directories missing from opt->priv->paths */
-	if (opt->priv->pool) {
-		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
-		free((char*)new_path);
-		new_path = (char *)cur_path;
-	} else {
-		cur_path = new_path;
-	}
+	cur_path = mem_pool_strdup(&opt->priv->pool, new_path);
+	free((char*)new_path);
+	new_path = (char *)cur_path;
 
 	while (1) {
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = pool_strndup(opt->priv->pool,
-						   cur_path,
-						   last_slash - cur_path);
+			parent_name = mem_pool_strndup(&opt->priv->pool,
+						       cur_path,
+						       last_slash - cur_path);
 		} else {
 			parent_name = opt->priv->toplevel_dir;
 			break;
@@ -2400,8 +2323,6 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Look it up in opt->priv->paths */
 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
 		if (entry) {
-			if (!opt->priv->pool)
-				free((char*)parent_name);
 			parent_name = entry->key; /* reuse known pointer */
 			break;
 		}
@@ -2428,16 +2349,6 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		parent_name = cur_dir;
 	}
 
-	if (!opt->priv->pool) {
-		/*
-		 * We are removing old_path from opt->priv->paths.
-		 * old_path also will eventually need to be freed, but it
-		 * may still be used by e.g.  ci->pathnames.  So, store it
-		 * in another string-list for now.
-		 */
-		string_list_append(&opt->priv->paths_to_free, old_path);
-	}
-
 	assert(ci->filemask == 2 || ci->filemask == 4);
 	assert(ci->dirmask == 0);
 	strmap_remove(&opt->priv->paths, old_path, 0);
@@ -2471,8 +2382,6 @@ static void apply_directory_rename_modifications(struct merge_options *opt,
 		new_ci->stages[index].mode = ci->stages[index].mode;
 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
 
-		if (!opt->priv->pool)
-			free(ci);
 		ci = new_ci;
 	}
 
@@ -2888,7 +2797,6 @@ static void use_cached_pairs(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
-	struct mem_pool *pool = opt->priv->pool;
 
 	/*
 	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
@@ -2900,30 +2808,24 @@ static void use_cached_pairs(struct merge_options *opt,
 		const char *new_name = entry->value;
 		if (!new_name)
 			new_name = old_name;
-		if (pool) {
-			/*
-			 * cached_pairs has _copies* of old_name and new_name,
-			 * because it has to persist across merges.  When
-			 *   pool != NULL
-			 * pool_alloc_filespec() will just re-use the existing
-			 * filenames, which will also get re-used by
-			 * opt->priv->paths if they become renames, and then
-			 * get freed at the end of the merge, leaving the copy
-			 * in cached_pairs dangling.  Avoid this by making a
-			 * copy here.
-			 *
-			 * When pool == NULL, pool_alloc_filespec() calls
-			 * alloc_filespec(), which makes a copy; we don't want
-			 * to add another.
-			 */
-			old_name = mem_pool_strdup(pool, old_name);
-			new_name = mem_pool_strdup(pool, new_name);
-		}
+
+		/*
+		 * cached_pairs has *copies* of old_name and new_name,
+		 * because it has to persist across merges.  Since
+		 * pool_alloc_filespec() will just re-use the existing
+		 * filenames, which will also get re-used by
+		 * opt->priv->paths if they become renames, and then
+		 * get freed at the end of the merge, that would leave
+		 * the copy in cached_pairs dangling.  Avoid this by
+		 * making a copy here.
+		 */
+		old_name = mem_pool_strdup(&opt->priv->pool, old_name);
+		new_name = mem_pool_strdup(&opt->priv->pool, new_name);
 
 		/* We don't care about oid/mode, only filenames and status */
-		one = pool_alloc_filespec(pool, old_name);
-		two = pool_alloc_filespec(pool, new_name);
-		pool_diff_queue(pool, pairs, one, two);
+		one = pool_alloc_filespec(&opt->priv->pool, old_name);
+		two = pool_alloc_filespec(&opt->priv->pool, new_name);
+		pool_diff_queue(&opt->priv->pool, pairs, one, two);
 		pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
 	}
 }
@@ -3031,7 +2933,7 @@ static int detect_regular_renames(struct merge_options *opt,
 	diff_queued_diff = renames->pairs[side_index];
 	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diffcore_rename_extended(&diff_opts,
-				 opt->priv->pool,
+				 &opt->priv->pool,
 				 &renames->relevant_sources[side_index],
 				 &renames->dirs_removed[side_index],
 				 &renames->dir_rename_count[side_index],
@@ -3082,7 +2984,7 @@ static int collect_renames(struct merge_options *opt,
 
 		if (p->status != 'A' && p->status != 'R') {
 			possibly_cache_new_pair(renames, p, side_index, NULL);
-			pool_diff_free_filepair(opt->priv->pool, p);
+			pool_diff_free_filepair(&opt->priv->pool, p);
 			continue;
 		}
 
@@ -3095,7 +2997,7 @@ static int collect_renames(struct merge_options *opt,
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
 		if (p->status != 'R' && !new_path) {
-			pool_diff_free_filepair(opt->priv->pool, p);
+			pool_diff_free_filepair(&opt->priv->pool, p);
 			continue;
 		}
 
@@ -3213,7 +3115,7 @@ cleanup:
 		side_pairs = &renames->pairs[s];
 		for (i = 0; i < side_pairs->nr; ++i) {
 			struct diff_filepair *p = side_pairs->queue[i];
-			pool_diff_free_filepair(opt->priv->pool, p);
+			pool_diff_free_filepair(&opt->priv->pool, p);
 		}
 	}
 
@@ -3226,7 +3128,7 @@ simple_cleanup:
 	if (combined.nr) {
 		int i;
 		for (i = 0; i < combined.nr; i++)
-			pool_diff_free_filepair(opt->priv->pool,
+			pool_diff_free_filepair(&opt->priv->pool,
 						combined.queue[i]);
 		free(combined.queue);
 	}
@@ -3701,7 +3603,7 @@ static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+		new_ci = mem_pool_calloc(&opt->priv->pool, 1, sizeof(*new_ci));
 
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
@@ -3794,7 +3696,8 @@ static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
+			new_ci = mem_pool_alloc(&opt->priv->pool,
+						sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
 				rename_a = 1;
@@ -3863,19 +3766,8 @@ static void process_entry(struct merge_options *opt,
 				b_path = path;
 			strmap_put(&opt->priv->paths, b_path, new_ci);
 
-			if (rename_a && rename_b) {
+			if (rename_a && rename_b)
 				strmap_remove(&opt->priv->paths, path, 0);
-				/*
-				 * We removed path from opt->priv->paths.  path
-				 * will also eventually need to be freed if not
-				 * part of a memory pool...but it may still be
-				 * used by e.g. ci->pathnames.  So, store it in
-				 * another string-list for now in that case.
-				 */
-				if (!opt->priv->pool)
-					string_list_append(&opt->priv->paths_to_free,
-							   path);
-			}
 
 			/*
 			 * Do special handling for b_path since process_entry()
@@ -4482,13 +4374,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of various renames fields */
 	renames = &opt->priv->renames;
-#if USE_MEMORY_POOL
-	mem_pool_init(&opt->priv->internal_pool, 0);
-	opt->priv->pool = &opt->priv->internal_pool;
-#else
-	opt->priv->pool = NULL;
-#endif
-	pool = opt->priv->pool;
+	mem_pool_init(&opt->priv->pool, 0);
+	pool = &opt->priv->pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
 					    NOT_RELEVANT, pool, 0);
@@ -4525,15 +4412,13 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * Although we initialize opt->priv->paths with strdup_strings=0,
 	 * that's just to avoid making yet another copy of an allocated
 	 * string.  Putting the entry into paths means we are taking
-	 * ownership, so we will later free it.  paths_to_free is similar.
+	 * ownership, so we will later free it.
 	 *
 	 * In contrast, conflicted just has a subset of keys from paths, so
 	 * we don't want to free those (it'd be a duplicate free).
 	 */
 	strmap_init_with_options(&opt->priv->paths, pool, 0);
 	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
-	if (!opt->priv->pool)
-		string_list_init_nodup(&opt->priv->paths_to_free);
 
 	/*
 	 * keys & strbufs in output will sometimes need to outlive "paths",
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 0/9] Final optimization batch (#15): use memory pools
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (8 preceding siblings ...)
  2021-07-31 17:27       ` [PATCH v4 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
@ 2021-08-02 15:27       ` Derrick Stolee
  2021-08-03 15:45       ` Jeff King
  10 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-08-02 15:27 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Jeff King, Eric Sunshine, Elijah Newren,
	Ævar Arnfjörð Bjarmason

On 7/31/2021 1:27 PM, Elijah Newren via GitGitGadget wrote:
> This series textually depends on en/ort-perf-batch-14, but the ideas are
> orthogonal to it and orthogonal to previous series. It can be reviewed
> independently.
> 
> Changes since v1, addressing Eric's feedback:
> 
>  * Fixed a comment that became out-of-date in patch 1
>  * Swapped commits 2 and 3 so that one can better motivate the other.
> 
> Changes since v2, addressing Peff's feedback:
> 
>  * Rebased on en/ort-perf-batch-14 (resolving a trivial conflict with the
>    new string_list_init_nodup() usage)
>  * Added a new preliminary patch renaming str*_func() to str*_clear_func()
>  * Added a new final patch that hardcodes that we'll just use memory pools
> 
> Changes since v3, as per Peff's feedback:
> 
>  * Don't only remove the extra complexity from the USE_MEMORY_POOL #define;
>    also remove the original bookkeeping complexity needed to track
>    individual frees when not using a memory pool.
I read the discussion leading to these changes and gave this version another
pass. Looks good to me. Thanks!

-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 0/9] Final optimization batch (#15): use memory pools
  2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
                         ` (9 preceding siblings ...)
  2021-08-02 15:27       ` [PATCH v4 0/9] Final optimization batch (#15): use " Derrick Stolee
@ 2021-08-03 15:45       ` Jeff King
  10 siblings, 0 replies; 65+ messages in thread
From: Jeff King @ 2021-08-03 15:45 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Sat, Jul 31, 2021 at 05:27:29PM +0000, Elijah Newren via GitGitGadget wrote:

> Changes since v3, as per Peff's feedback:
> 
>  * Don't only remove the extra complexity from the USE_MEMORY_POOL #define;
>    also remove the original bookkeeping complexity needed to track
>    individual frees when not using a memory pool.

Thanks. I skimmed over the changes in patch 9, under the assumption that
they were mostly what I showed before. ;)

The whole thing looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2021-08-03 15:45 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-23 21:59   ` Eric Sunshine
2021-07-23 22:03     ` Elijah Newren
2021-07-23 12:54 ` [PATCH 2/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-23 22:07   ` Eric Sunshine
2021-07-26 14:36   ` Derrick Stolee
2021-07-28 22:49     ` Elijah Newren
2021-07-29 15:26       ` Jeff King
2021-07-30  2:27         ` Elijah Newren
2021-07-30 16:12           ` Jeff King
2021-07-23 12:54 ` [PATCH 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-26 14:44 ` [PATCH 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
2021-07-28 22:52   ` Elijah Newren
2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 2/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 3/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-29 15:28     ` Jeff King
2021-07-29 18:37       ` Elijah Newren
2021-07-29 20:09         ` Jeff King
2021-07-30  2:30           ` Elijah Newren
2021-07-30 16:12             ` Jeff King
2021-07-30 13:30           ` Ævar Arnfjörð Bjarmason
2021-07-30 14:36             ` Elijah Newren
2021-07-30 16:23               ` Ævar Arnfjörð Bjarmason
2021-07-29  3:58   ` [PATCH v2 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-29 14:58   ` [PATCH v2 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
2021-07-29 16:20   ` Jeff King
2021-07-29 16:23     ` Jeff King
2021-07-29 19:46       ` Junio C Hamano
2021-07-29 20:48         ` Junio C Hamano
2021-07-29 21:05           ` Elijah Newren
2021-07-29 20:46     ` Elijah Newren
2021-07-29 21:14       ` Jeff King
2021-07-30 11:47   ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
2021-07-30 16:24       ` Jeff King
2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
2021-08-02 15:27       ` [PATCH v4 0/9] Final optimization batch (#15): use " Derrick Stolee
2021-08-03 15:45       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).