* [PATCH v2 1/5] commit-reach: reduce requirements for remove_redundant()
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
@ 2021-02-01 12:47 ` Derrick Stolee via GitGitGadget
2021-02-01 19:51 ` Junio C Hamano
2021-02-01 12:47 ` [PATCH v2 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-01 12:47 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Remove a comment at the beggining of remove_redundant() that mentions a
reordering of the input array to have the initial segment be the
independent commits and the final segment be the redundant commits.
While this behavior is followed in remove_redundant(), no callers rely
on that behavior.
Remove the final loop that copies this final segment and update the
comment to match the new behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index e38771ca5a1..9af51fe7e07 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -160,9 +160,10 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
{
/*
* Some commit in the array may be an ancestor of
- * another commit. Move such commit to the end of
- * the array, and return the number of commits that
- * are independent from each other.
+ * another commit. Move the independent commits to the
+ * beginning of 'array' and return their number. Callers
+ * should not rely upon the contents of 'array' after
+ * that number.
*/
struct commit **work;
unsigned char *redundant;
@@ -209,9 +210,6 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
for (i = filled = 0; i < cnt; i++)
if (!redundant[i])
array[filled++] = work[i];
- for (j = filled, i = 0; i < cnt; i++)
- if (redundant[i])
- array[j++] = work[i];
free(work);
free(redundant);
free(filled_index);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 1/5] commit-reach: reduce requirements for remove_redundant()
2021-02-01 12:47 ` [PATCH v2 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-01 19:51 ` Junio C Hamano
0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2021-02-01 19:51 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, Michael Haggerty, me, peff, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Remove a comment at the beggining of remove_redundant() that mentions a
> reordering of the input array to have the initial segment be the
> independent commits and the final segment be the redundant commits.
> While this behavior is followed in remove_redundant(), no callers rely
> on that behavior.
>
> Remove the final loop that copies this final segment and update the
> comment to match the new behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
Makes sense, especially since this is a file-local/static helper
function, we have reasonably tight control over its callers.
Will queue. Thanks.
> commit-reach.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index e38771ca5a1..9af51fe7e07 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -160,9 +160,10 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
> {
> /*
> * Some commit in the array may be an ancestor of
> - * another commit. Move such commit to the end of
> - * the array, and return the number of commits that
> - * are independent from each other.
> + * another commit. Move the independent commits to the
> + * beginning of 'array' and return their number. Callers
> + * should not rely upon the contents of 'array' after
> + * that number.
> */
> struct commit **work;
> unsigned char *redundant;
> @@ -209,9 +210,6 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
> for (i = filled = 0; i < cnt; i++)
> if (!redundant[i])
> array[filled++] = work[i];
> - for (j = filled, i = 0; i < cnt; i++)
> - if (redundant[i])
> - array[j++] = work[i];
> free(work);
> free(redundant);
> free(filled_index);
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 2/5] commit-reach: use one walk in remove_redundant()
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2021-02-01 12:47 ` [PATCH v2 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-01 12:47 ` Derrick Stolee via GitGitGadget
2021-02-01 16:12 ` René Scharfe.
2021-02-01 12:47 ` [PATCH v2 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-01 12:47 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 108 +++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 100 insertions(+), 8 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 9af51fe7e07..053676e51d0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -156,15 +156,9 @@ struct commit_list *get_octopus_merge_bases(struct commit_list *in)
return ret;
}
-static int remove_redundant(struct repository *r, struct commit **array, int cnt)
+static int remove_redundant_no_gen(struct repository *r,
+ struct commit **array, int cnt)
{
- /*
- * Some commit in the array may be an ancestor of
- * another commit. Move the independent commits to the
- * beginning of 'array' and return their number. Callers
- * should not rely upon the contents of 'array' after
- * that number.
- */
struct commit **work;
unsigned char *redundant;
int *filled_index;
@@ -210,12 +204,110 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
for (i = filled = 0; i < cnt; i++)
if (!redundant[i])
array[filled++] = work[i];
+ for (j = filled, i = 0; i < cnt; i++)
+ if (redundant[i])
+ array[j++] = work[i];
free(work);
free(redundant);
free(filled_index);
return filled;
}
+static int remove_redundant_with_gen(struct repository *r,
+ struct commit **array, int cnt)
+{
+ int i, count_non_stale = 0;
+ timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
+ struct commit **walk_start;
+ size_t walk_start_nr = 0, walk_start_alloc = cnt;
+ struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+
+ ALLOC_ARRAY(walk_start, walk_start_alloc);
+
+ /* Mark all parents of the input as STALE */
+ for (i = 0; i < cnt; i++) {
+ struct commit_list *parents;
+ timestamp_t generation;
+
+ repo_parse_commit(r, array[i]);
+ parents = array[i]->parents;
+
+ while (parents) {
+ repo_parse_commit(r, parents->item);
+ if (!(parents->item->object.flags & STALE)) {
+ parents->item->object.flags |= STALE;
+ ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc);
+ walk_start[walk_start_nr++] = parents->item;
+ prio_queue_put(&queue, parents->item);
+ }
+ parents = parents->next;
+ }
+
+ generation = commit_graph_generation(array[i]);
+
+ if (generation < min_generation)
+ min_generation = generation;
+ }
+
+ /* push the STALE bits up to min generation */
+ while (queue.nr) {
+ struct commit_list *parents;
+ struct commit *c = prio_queue_get(&queue);
+
+ repo_parse_commit(r, c);
+
+ if (commit_graph_generation(c) < min_generation)
+ continue;
+
+ parents = c->parents;
+ while (parents) {
+ if (!(parents->item->object.flags & STALE)) {
+ parents->item->object.flags |= STALE;
+ prio_queue_put(&queue, parents->item);
+ }
+ parents = parents->next;
+ }
+ }
+
+ /* rearrange array */
+ for (i = count_non_stale = 0; i < cnt; i++) {
+ if (!(array[i]->object.flags & STALE))
+ array[count_non_stale++] = array[i];
+ }
+
+ /* clear marks */
+ for (i = 0; i < walk_start_nr; i++)
+ clear_commit_marks(walk_start[i], STALE);
+ free(walk_start);
+
+ return count_non_stale;
+}
+
+static int remove_redundant(struct repository *r, struct commit **array, int cnt)
+{
+ /*
+ * Some commit in the array may be an ancestor of
+ * another commit. Move the independent commits to the
+ * beginning of 'array' and return their number. Callers
+ * should not rely upon the contents of 'array' after
+ * that number.
+ */
+ if (generation_numbers_enabled(r)) {
+ int i;
+
+ /*
+ * If we have a single commit with finite generation
+ * number, then the _with_gen algorithm is preferred.
+ */
+ for (i = 0; i < cnt; i++) {
+ if (commit_graph_generation(array[i]) < GENERATION_NUMBER_INFINITY)
+ return remove_redundant_with_gen(r, array, cnt);
+ }
+ }
+
+ return remove_redundant_no_gen(r, array, cnt);
+}
+
static struct commit_list *get_merge_bases_many_0(struct repository *r,
struct commit *one,
int n,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 2/5] commit-reach: use one walk in remove_redundant()
2021-02-01 12:47 ` [PATCH v2 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-01 16:12 ` René Scharfe.
2021-02-01 16:31 ` Derrick Stolee
0 siblings, 1 reply; 36+ messages in thread
From: René Scharfe. @ 2021-02-01 16:12 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget, git
Cc: Michael Haggerty, me, peff, gitster, Derrick Stolee,
Derrick Stolee, Derrick Stolee
Am 01.02.21 um 13:47 schrieb Derrick Stolee via GitGitGadget:
> @@ -210,12 +204,110 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
> for (i = filled = 0; i < cnt; i++)
> if (!redundant[i])
> array[filled++] = work[i];
> + for (j = filled, i = 0; i < cnt; i++)
> + if (redundant[i])
> + array[j++] = work[i];
This puts the loop back in that you removed in the previous commit.
Intentionally?
> free(work);
> free(redundant);
> free(filled_index);
> return filled;
> }
>
> +static int remove_redundant_with_gen(struct repository *r,
> + struct commit **array, int cnt)
> +{
> + int i, count_non_stale = 0;
> + timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
> + struct commit **walk_start;
> + size_t walk_start_nr = 0, walk_start_alloc = cnt;
> + struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +
> + ALLOC_ARRAY(walk_start, walk_start_alloc);
> +
> + /* Mark all parents of the input as STALE */
> + for (i = 0; i < cnt; i++) {
> + struct commit_list *parents;
> + timestamp_t generation;
> +
> + repo_parse_commit(r, array[i]);
> + parents = array[i]->parents;
> +
> + while (parents) {
> + repo_parse_commit(r, parents->item);
> + if (!(parents->item->object.flags & STALE)) {
> + parents->item->object.flags |= STALE;
> + ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc);
> + walk_start[walk_start_nr++] = parents->item;
> + prio_queue_put(&queue, parents->item);
> + }
> + parents = parents->next;
> + }
> +
> + generation = commit_graph_generation(array[i]);
> +
> + if (generation < min_generation)
> + min_generation = generation;
> + }
> +
> + /* push the STALE bits up to min generation */
> + while (queue.nr) {
> + struct commit_list *parents;
> + struct commit *c = prio_queue_get(&queue);
> +
> + repo_parse_commit(r, c);
> +
> + if (commit_graph_generation(c) < min_generation)
> + continue;
> +
> + parents = c->parents;
> + while (parents) {
> + if (!(parents->item->object.flags & STALE)) {
> + parents->item->object.flags |= STALE;
> + prio_queue_put(&queue, parents->item);
> + }
> + parents = parents->next;
> + }
> + }
> +
> + /* rearrange array */
> + for (i = count_non_stale = 0; i < cnt; i++) {
> + if (!(array[i]->object.flags & STALE))
Here I would have added another condition, count_non_stale != i, to
avoid self-assignment (array[x] = array[x]). The code works without
it, though. Not sure if there is a performance benefit to be had --
branch vs. pointer copy. Probably not worth it..
> + array[count_non_stale++] = array[i];
> + }
> +
> + /* clear marks */
> + for (i = 0; i < walk_start_nr; i++)
> + clear_commit_marks(walk_start[i], STALE);
You can replace this loop with a call to clear_commit_marks_many().
> + free(walk_start);
> +
> + return count_non_stale;
> +}
> +
> +static int remove_redundant(struct repository *r, struct commit **array, int cnt)
> +{
> + /*
> + * Some commit in the array may be an ancestor of
> + * another commit. Move the independent commits to the
> + * beginning of 'array' and return their number. Callers
> + * should not rely upon the contents of 'array' after
> + * that number.
> + */
> + if (generation_numbers_enabled(r)) {
> + int i;
> +
> + /*
> + * If we have a single commit with finite generation
> + * number, then the _with_gen algorithm is preferred.
> + */
> + for (i = 0; i < cnt; i++) {
> + if (commit_graph_generation(array[i]) < GENERATION_NUMBER_INFINITY)
> + return remove_redundant_with_gen(r, array, cnt);
> + }
> + }
> +
> + return remove_redundant_no_gen(r, array, cnt);
> +}
> +
> static struct commit_list *get_merge_bases_many_0(struct repository *r,
> struct commit *one,
> int n,
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 2/5] commit-reach: use one walk in remove_redundant()
2021-02-01 16:12 ` René Scharfe.
@ 2021-02-01 16:31 ` Derrick Stolee
0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2021-02-01 16:31 UTC (permalink / raw)
To: René Scharfe., Derrick Stolee via GitGitGadget, git
Cc: Michael Haggerty, me, peff, gitster, Derrick Stolee, Derrick Stolee
On 2/1/2021 11:12 AM, René Scharfe. wrote:
> Am 01.02.21 um 13:47 schrieb Derrick Stolee via GitGitGadget:
>> @@ -210,12 +204,110 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
>> for (i = filled = 0; i < cnt; i++)
>> if (!redundant[i])
>> array[filled++] = work[i];
>> + for (j = filled, i = 0; i < cnt; i++)
>> + if (redundant[i])
>> + array[j++] = work[i];
>
> This puts the loop back in that you removed in the previous commit.
> Intentionally?
Not intentional. Thanks for noticing.
>> + /* rearrange array */
>> + for (i = count_non_stale = 0; i < cnt; i++) {
>> + if (!(array[i]->object.flags & STALE))
>
> Here I would have added another condition, count_non_stale != i, to
> avoid self-assignment (array[x] = array[x]). The code works without
> it, though. Not sure if there is a performance benefit to be had --> branch vs. pointer copy. Probably not worth it..
You are correct, but I'm going to go on the side of not worth it.
>> + array[count_non_stale++] = array[i];
>> + }
>> +
>> + /* clear marks */
>> + for (i = 0; i < walk_start_nr; i++)
>> + clear_commit_marks(walk_start[i], STALE);
>
> You can replace this loop with a call to clear_commit_marks_many().
Right! Earlier I was using a 'struct commit_list *' which would not
work, but this 'struct commit ** walk_start' does work. Thanks.
-Stolee
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 3/5] commit-reach: move compare_commits_by_gen
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2021-02-01 12:47 ` [PATCH v2 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
2021-02-01 12:47 ` [PATCH v2 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-01 12:47 ` Derrick Stolee via GitGitGadget
2021-02-01 12:47 ` [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-01 12:47 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Move this earlier in the file so it can be used by more methods.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 053676e51d0..7bf52e94429 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -17,6 +17,21 @@
static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
+static int compare_commits_by_gen(const void *_a, const void *_b)
+{
+ const struct commit *a = *(const struct commit * const *)_a;
+ const struct commit *b = *(const struct commit * const *)_b;
+
+ timestamp_t generation_a = commit_graph_generation(a);
+ timestamp_t generation_b = commit_graph_generation(b);
+
+ if (generation_a < generation_b)
+ return -1;
+ if (generation_a > generation_b)
+ return 1;
+ return 0;
+}
+
static int queue_has_nonstale(struct prio_queue *queue)
{
int i;
@@ -651,21 +666,6 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
return repo_is_descendant_of(the_repository, commit, list);
}
-static int compare_commits_by_gen(const void *_a, const void *_b)
-{
- const struct commit *a = *(const struct commit * const *)_a;
- const struct commit *b = *(const struct commit * const *)_b;
-
- timestamp_t generation_a = commit_graph_generation(a);
- timestamp_t generation_b = commit_graph_generation(b);
-
- if (generation_a < generation_b)
- return -1;
- if (generation_a > generation_b)
- return 1;
- return 0;
-}
-
int can_all_from_reach_with_flag(struct object_array *from,
unsigned int with_flag,
unsigned int assign_flag,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant()
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2021-02-01 12:47 ` [PATCH v2 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
@ 2021-02-01 12:47 ` Derrick Stolee via GitGitGadget
2021-02-01 20:05 ` Junio C Hamano
2021-02-01 12:47 ` [PATCH v2 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-01 12:47 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Reachability algorithms in commit-reach.c frequently benefit from using
the first-parent history as a heuristic for satisfying reachability
queries. The most obvious example was implemented in 4fbcca4e
(commit-reach: make can_all_from_reach... linear, 2018-07-20).
Update the walk in remove_redundant() to use this same heuristic. Here,
we are walking starting at the parents of the input commits. Sort those
parents and walk from the highest generation to lower. Each time, use
the heuristic of searching the first parent history before continuing to
expand the walk.
The order in which we explore the commits matters, so update
compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with
corrected commit dates computed, but it will assist when the commits are
in the region "above" the commit-graph with "infinite" generation
number. Note that we cannot shift to use
compare_commits_by_gen_then_commit_date as the method prototype is
different. We use compare_commits_by_gen for QSORT() as opposed to as a
priority function.
The important piece is to ensure we short-circuit the walk when we find
that there is a single non-redundant commit. This happens frequently
when looking for merge-bases or comparing several tags with 'git
merge-base --independent'. Use a new count 'count_still_independent' and
if that hits 1 we can stop walking.
To update 'count_still_independent' properly, we add use of the RESULT
flag on the input commits. Then we can detect when we reach one of these
commits and decrease the count. We need to remove the RESULT flag at
that moment because we might re-visit that commit when popping the
stack.
We use the STALE flag to mark parents that have been added to the new
walk_start list, but we need to clear that flag before we start walking
so those flags don't halt our depth-first-search walk.
On my copy of the Linux kernel repository, the performance of 'git
merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
seconds.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 72 +++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 56 insertions(+), 16 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 7bf52e94429..d3a6e2bdd04 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -29,6 +29,10 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
return -1;
if (generation_a > generation_b)
return 1;
+ if (a->date < b->date)
+ return -1;
+ if (a->date > b->date)
+ return 1;
return 0;
}
@@ -231,11 +235,10 @@ static int remove_redundant_no_gen(struct repository *r,
static int remove_redundant_with_gen(struct repository *r,
struct commit **array, int cnt)
{
- int i, count_non_stale = 0;
+ int i, count_non_stale = 0, count_still_independent = cnt;
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
struct commit **walk_start;
size_t walk_start_nr = 0, walk_start_alloc = cnt;
- struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
ALLOC_ARRAY(walk_start, walk_start_alloc);
@@ -245,6 +248,7 @@ static int remove_redundant_with_gen(struct repository *r,
timestamp_t generation;
repo_parse_commit(r, array[i]);
+ array[i]->object.flags |= RESULT;
parents = array[i]->parents;
while (parents) {
@@ -253,7 +257,6 @@ static int remove_redundant_with_gen(struct repository *r,
parents->item->object.flags |= STALE;
ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc);
walk_start[walk_start_nr++] = parents->item;
- prio_queue_put(&queue, parents->item);
}
parents = parents->next;
}
@@ -264,26 +267,63 @@ static int remove_redundant_with_gen(struct repository *r,
min_generation = generation;
}
- /* push the STALE bits up to min generation */
- while (queue.nr) {
- struct commit_list *parents;
- struct commit *c = prio_queue_get(&queue);
+ QSORT(walk_start, walk_start_nr, compare_commits_by_gen);
- repo_parse_commit(r, c);
+ /* remove STALE bit for now to allow walking through parents */
+ for (i = 0; i < walk_start_nr; i++)
+ walk_start[i]->object.flags &= ~STALE;
- if (commit_graph_generation(c) < min_generation)
- continue;
+ /*
+ * Start walking from the highest generation. Hopefully, it will
+ * find all other items during the first-parent walk, and we can
+ * terminate early. Otherwise, we will do the same amount of work
+ * as before.
+ */
+ for (i = walk_start_nr - 1; i >= 0 && count_still_independent > 1; i--) {
+ /* push the STALE bits up to min generation */
+ struct commit_list *stack = NULL;
- parents = c->parents;
- while (parents) {
- if (!(parents->item->object.flags & STALE)) {
- parents->item->object.flags |= STALE;
- prio_queue_put(&queue, parents->item);
+ commit_list_insert(walk_start[i], &stack);
+ walk_start[i]->object.flags |= STALE;
+
+ while (stack) {
+ struct commit_list *parents;
+ struct commit *c = stack->item;
+
+ repo_parse_commit(r, c);
+
+ if (c->object.flags & RESULT) {
+ c->object.flags &= ~RESULT;
+ if (--count_still_independent <= 1)
+ break;
}
- parents = parents->next;
+
+ if (commit_graph_generation(c) < min_generation) {
+ pop_commit(&stack);
+ continue;
+ }
+
+ parents = c->parents;
+ while (parents) {
+ if (!(parents->item->object.flags & STALE)) {
+ parents->item->object.flags |= STALE;
+ commit_list_insert(parents->item, &stack);
+ break;
+ }
+ parents = parents->next;
+ }
+
+ /* pop if all parents have been visited already */
+ if (!parents)
+ pop_commit(&stack);
}
+ free_commit_list(stack);
}
+ /* clear result */
+ for (i = 0; i < cnt; i++)
+ array[i]->object.flags &= ~RESULT;
+
/* rearrange array */
for (i = count_non_stale = 0; i < cnt; i++) {
if (!(array[i]->object.flags & STALE))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant()
2021-02-01 12:47 ` [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-01 20:05 ` Junio C Hamano
2021-02-01 21:02 ` Derrick Stolee
0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-02-01 20:05 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, Michael Haggerty, me, peff, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The important piece is to ensure we short-circuit the walk when we find
> that there is a single non-redundant commit. This happens frequently
> when looking for merge-bases or comparing several tags with 'git
> merge-base --independent'. Use a new count 'count_still_independent' and
> if that hits 1 we can stop walking.
That is because when you are left one single thing, it may be able
to reach many other things, but the fact that it by itself won't be
reachable by remaining independent things will not change (because,
that sole remaining independent thing is itself)?
> To update 'count_still_independent' properly, we add use of the RESULT
> flag on the input commits. Then we can detect when we reach one of these
> commits and decrease the count. We need to remove the RESULT flag at
> that moment because we might re-visit that commit when popping the
> stack.
>
> We use the STALE flag to mark parents that have been added to the new
> walk_start list, but we need to clear that flag before we start walking
> so those flags don't halt our depth-first-search walk.
>
> On my copy of the Linux kernel repository, the performance of 'git
> merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
> seconds.
These two numbers are with commit-graph fully populated with usable
generation numbers, I presume, and it is quite impressive.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant()
2021-02-01 20:05 ` Junio C Hamano
@ 2021-02-01 21:02 ` Derrick Stolee
0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2021-02-01 21:02 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget
Cc: git, Michael Haggerty, me, peff, René Scharfe,
Derrick Stolee, Derrick Stolee
On 2/1/2021 3:05 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> The important piece is to ensure we short-circuit the walk when we find
>> that there is a single non-redundant commit. This happens frequently
>> when looking for merge-bases or comparing several tags with 'git
>> merge-base --independent'. Use a new count 'count_still_independent' and
>> if that hits 1 we can stop walking.
>
> That is because when you are left one single thing, it may be able
> to reach many other things, but the fact that it by itself won't be
> reachable by remaining independent things will not change (because,
> that sole remaining independent thing is itself)?
Right. If there is only one non-STALE input commit left, then it will
be the only returned result. We will never find that all commits are
redundant because the commit graph is acyclic. The performance
improvement comes from halting the DFS walk: there might be more
commits to walk but they won't change the result.
>> To update 'count_still_independent' properly, we add use of the RESULT
>> flag on the input commits. Then we can detect when we reach one of these
>> commits and decrease the count. We need to remove the RESULT flag at
>> that moment because we might re-visit that commit when popping the
>> stack.
>>
>> We use the STALE flag to mark parents that have been added to the new
>> walk_start list, but we need to clear that flag before we start walking
>> so those flags don't halt our depth-first-search walk.
>>
>> On my copy of the Linux kernel repository, the performance of 'git
>> merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
>> seconds.
>
> These two numbers are with commit-graph fully populated with usable
> generation numbers, I presume, and it is quite impressive.
Yes, these numbers are with a full commit-graph.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 5/5] commit-reach: stale commits may prune generation further
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2021-02-01 12:47 ` [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-01 12:47 ` Derrick Stolee via GitGitGadget
2021-02-03 15:59 ` Taylor Blau
2021-02-01 15:48 ` [PATCH v2 0/5] Speed up remove_redundant() Derrick Stolee
2021-02-19 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget
6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-01 12:47 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index d3a6e2bdd04..c2e0747fea4 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -237,15 +237,27 @@ static int remove_redundant_with_gen(struct repository *r,
{
int i, count_non_stale = 0, count_still_independent = cnt;
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
- struct commit **walk_start;
+ struct commit **walk_start, **sorted;
size_t walk_start_nr = 0, walk_start_alloc = cnt;
+ int min_gen_pos = 0;
+
+ /*
+ * Sort the input by generation number, ascending. This allows
+ * us to increase the "min_generation" limit when we discover
+ * the commit with lowest generation is STALE. The index
+ * min_gen_pos points to the current position within 'array'
+ * that is not yet known to be STALE.
+ */
+ ALLOC_ARRAY(sorted, cnt);
+ COPY_ARRAY(sorted, array, cnt);
+ QSORT(sorted, cnt, compare_commits_by_gen);
+ min_generation = commit_graph_generation(sorted[0]);
ALLOC_ARRAY(walk_start, walk_start_alloc);
/* Mark all parents of the input as STALE */
for (i = 0; i < cnt; i++) {
struct commit_list *parents;
- timestamp_t generation;
repo_parse_commit(r, array[i]);
array[i]->object.flags |= RESULT;
@@ -260,11 +272,6 @@ static int remove_redundant_with_gen(struct repository *r,
}
parents = parents->next;
}
-
- generation = commit_graph_generation(array[i]);
-
- if (generation < min_generation)
- min_generation = generation;
}
QSORT(walk_start, walk_start_nr, compare_commits_by_gen);
@@ -296,6 +303,12 @@ static int remove_redundant_with_gen(struct repository *r,
c->object.flags &= ~RESULT;
if (--count_still_independent <= 1)
break;
+ if (oideq(&c->object.oid, &sorted[min_gen_pos]->object.oid)) {
+ while (min_gen_pos < cnt - 1 &&
+ (sorted[min_gen_pos]->object.flags & STALE))
+ min_gen_pos++;
+ min_generation = commit_graph_generation(sorted[min_gen_pos]);
+ }
}
if (commit_graph_generation(c) < min_generation) {
@@ -319,6 +332,7 @@ static int remove_redundant_with_gen(struct repository *r,
}
free_commit_list(stack);
}
+ free(sorted);
/* clear result */
for (i = 0; i < cnt; i++)
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 5/5] commit-reach: stale commits may prune generation further
2021-02-01 12:47 ` [PATCH v2 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget
@ 2021-02-03 15:59 ` Taylor Blau
0 siblings, 0 replies; 36+ messages in thread
From: Taylor Blau @ 2021-02-03 15:59 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
On Mon, Feb 01, 2021 at 12:47:27PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> Note that these are only modest improvements for the case where the two
> independent commits are not in the commit-graph (not until v5.10). All
> algorithms get faster as more commits are indexed, which is not a
> surprise. However, the cost of walking extra commits is more and more
> prevalent in relative terms as more commits are indexed. Finally, the
> last case allows us to jump to the minimum generation between the last
> two commits (that are actually independent) so we greatly reduce the
> cost in that case.
Very nice. The explanation and implementation makes sense to me.
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> commit-reach.c | 28 +++++++++++++++++++++-------
> 1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index d3a6e2bdd04..c2e0747fea4 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -237,15 +237,27 @@ static int remove_redundant_with_gen(struct repository *r,
> {
> int i, count_non_stale = 0, count_still_independent = cnt;
> timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
> - struct commit **walk_start;
> + struct commit **walk_start, **sorted;
> size_t walk_start_nr = 0, walk_start_alloc = cnt;
> + int min_gen_pos = 0;
> +
> + /*
> + * Sort the input by generation number, ascending. This allows
> + * us to increase the "min_generation" limit when we discover
> + * the commit with lowest generation is STALE. The index
> + * min_gen_pos points to the current position within 'array'
> + * that is not yet known to be STALE.
> + */
> + ALLOC_ARRAY(sorted, cnt);
> + COPY_ARRAY(sorted, array, cnt);
> + QSORT(sorted, cnt, compare_commits_by_gen);
> + min_generation = commit_graph_generation(sorted[0]);
This line caught my eye, but we return early in
commit-reach.c:reduce_heads() before even calling remove_redundant()
(which itself calls remove_redundant_with_gen()), so it's always OK to
assume that we have at least one element in 'array'.
This patch and the others in v2 all look good to me.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] Speed up remove_redundant()
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2021-02-01 12:47 ` [PATCH v2 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget
@ 2021-02-01 15:48 ` Derrick Stolee
2021-02-18 23:25 ` Junio C Hamano
2021-02-19 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget
6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2021-02-01 15:48 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget, git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe, Derrick Stolee
On 2/1/2021 7:47 AM, Derrick Stolee via GitGitGadget wrote:
> My tests were on the Linux kernel repository, testing this command:
>
> git merge-base --independent $(git for-each-ref refs/tags --format="%(refname)")
>
> Before: 16.4s
> After Patch 1: 1.1s
> After Patch 2: 0.1s
During my testing of v2, I discovered how remove_redundant()
is being used as part of 'git log --simplify-merges'. So, here
are some performance numbers for the Linux kernel repository
with a full commit-graph and changed-path Bloom filters running
git log --oneline -n10 --simplify-merges -- kernel/
Benchmark #1: old
Time (mean ± σ): 5.770 s ± 0.030 s [User: 5.551 s, System: 0.219 s]
Range (min … max): 5.728 s … 5.827 s 10 runs
Benchmark #2: new
Time (mean ± σ): 5.030 s ± 0.017 s [User: 4.805 s, System: 0.224 s]
Range (min … max): 4.999 s … 5.054 s 10 runs
Summary
'new' ran
1.15 ± 0.01 times faster than 'old'
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] Speed up remove_redundant()
2021-02-01 15:48 ` [PATCH v2 0/5] Speed up remove_redundant() Derrick Stolee
@ 2021-02-18 23:25 ` Junio C Hamano
2021-02-19 12:17 ` Derrick Stolee
0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-02-18 23:25 UTC (permalink / raw)
To: Derrick Stolee
Cc: Derrick Stolee via GitGitGadget, git, Michael Haggerty, me, peff,
René Scharfe, Derrick Stolee
Derrick Stolee <stolee@gmail.com> writes:
> On 2/1/2021 7:47 AM, Derrick Stolee via GitGitGadget wrote:
>> My tests were on the Linux kernel repository, testing this command:
>>
>> git merge-base --independent $(git for-each-ref refs/tags --format="%(refname)")
>>
>> Before: 16.4s
>> After Patch 1: 1.1s
>> After Patch 2: 0.1s
>
> During my testing of v2, I discovered how remove_redundant()
> is being used as part of 'git log --simplify-merges'. So, here
> are some performance numbers for the Linux kernel repository
> with a full commit-graph and changed-path Bloom filters running
>
> git log --oneline -n10 --simplify-merges -- kernel/
>
> Benchmark #1: old
> Time (mean ± σ): 5.770 s ± 0.030 s [User: 5.551 s, System: 0.219 s]
> Range (min … max): 5.728 s … 5.827 s 10 runs
>
> Benchmark #2: new
> Time (mean ± σ): 5.030 s ± 0.017 s [User: 4.805 s, System: 0.224 s]
> Range (min … max): 4.999 s … 5.054 s 10 runs
>
> Summary
> 'new' ran
> 1.15 ± 0.01 times faster than 'old'
This topic seems to have been forgotten? Is the only thing missing
is an updated 2/5 or is there anything more that is needed?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] Speed up remove_redundant()
2021-02-18 23:25 ` Junio C Hamano
@ 2021-02-19 12:17 ` Derrick Stolee
2021-02-20 3:32 ` Junio C Hamano
0 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2021-02-19 12:17 UTC (permalink / raw)
To: Junio C Hamano
Cc: Derrick Stolee via GitGitGadget, git, Michael Haggerty, me, peff,
René Scharfe, Derrick Stolee
On 2/18/2021 6:25 PM, Junio C Hamano wrote:
> This topic seems to have been forgotten? Is the only thing missing
> is an updated 2/5 or is there anything more that is needed?
I had forgotten about the comments on patch 2/5, sorry. That explains
why it hasn't merged yet...
I'll get right on it.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] Speed up remove_redundant()
2021-02-19 12:17 ` Derrick Stolee
@ 2021-02-20 3:32 ` Junio C Hamano
0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2021-02-20 3:32 UTC (permalink / raw)
To: Derrick Stolee
Cc: Derrick Stolee via GitGitGadget, git, Michael Haggerty, me, peff,
René Scharfe, Derrick Stolee
Derrick Stolee <stolee@gmail.com> writes:
> On 2/18/2021 6:25 PM, Junio C Hamano wrote:
>> This topic seems to have been forgotten? Is the only thing missing
>> is an updated 2/5 or is there anything more that is needed?
>
> I had forgotten about the comments on patch 2/5, sorry. That explains
> why it hasn't merged yet...
>
> I'll get right on it.
Thanks.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 0/5] Speed up remove_redundant()
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2021-02-01 15:48 ` [PATCH v2 0/5] Speed up remove_redundant() Derrick Stolee
@ 2021-02-19 12:34 ` Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
` (4 more replies)
6 siblings, 5 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-19 12:34 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee
Michael Haggerty pointed out that git merge-base --independent is quite slow
when many commits are provided in the input. This boils down to some
quadratic behavior in remove_redundant() because it calls
paint_down_to_common() in a loop.
This series avoids that by using a single commit walk, pushing a flag that
means "this commit is reachable from a parent of a commit in the list." At
the end of the walk, the input commits without that flag are independent.
There is an extra performance enhancement using the first-parent history as
a heuristic. This helps only when there is a single independent result, as
we can short circuit the walk when all other inputs are found to be
reachable from the top one. It also gets faster when we discover the commit
with lowest generation and can restrict our walk further to the next-lowest
generation number.
My tests were on the Linux kernel repository, testing this command:
git merge-base --independent $(git for-each-ref refs/tags --format="%(refname)")
Before: 16.4s
After Patch 1: 1.1s
After Patch 2: 0.1s
This does not change the correctness of the function. It is tested carefully
in t6600-test-reach.c, among existing merge-base tests. There are also more
tests in scripts like t6012-rev-list-simplify.sh that trigger the new logic
under GIT_TEST_COMMIT_GRAPH=1.
Updates in V3
=============
* Patch 2 is updated to René's comments. Sorry I dropped the ball on this.
I had the changes locally, but never pushed them until now.
Updates in V2
=============
* The old algorithm is not entirely removed. It is still faster than the
new algorithm if all input commits are not in the commit-graph file
(unless the commit-graph file is really close to the input commits, but
that is an unreasonable expectation). Now, the new logic only happens if
there is an input commit with finite generation number.
* There was a copy of 'array' being used for the final reorder which is now
removed. We still need to be careful about the RESULT bit.
* The final patch is new, adding yet another speedup in the form of
increasing the min_generation cutoff when we find the input commit of
smallest generation. This can reduce the time significantly when given
many inputs. This does require copying the 'array' again, so we can sort
by generation number while preserving the original order of 'array' which
is required by some callers.
Thanks, -Stolee
Derrick Stolee (5):
commit-reach: reduce requirements for remove_redundant()
commit-reach: use one walk in remove_redundant()
commit-reach: move compare_commits_by_gen
commit-reach: use heuristic in remove_redundant()
commit-reach: stale commits may prune generation further
commit-reach.c | 190 ++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 165 insertions(+), 25 deletions(-)
base-commit: 5a3b130cad0d5c770f766e3af6d32b41766374c0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-852%2Fderrickstolee%2Fmerge-independent-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-852/derrickstolee/merge-independent-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/852
Range-diff vs v2:
1: 649f6799e6bf = 1: 649f6799e6bf commit-reach: reduce requirements for remove_redundant()
2: 2f80ae5fcb00 ! 2: c2c08989ce60 commit-reach: use one walk in remove_redundant()
@@ commit-reach.c: struct commit_list *get_octopus_merge_bases(struct commit_list *
unsigned char *redundant;
int *filled_index;
@@ commit-reach.c: static int remove_redundant(struct repository *r, struct commit **array, int cnt
- for (i = filled = 0; i < cnt; i++)
- if (!redundant[i])
- array[filled++] = work[i];
-+ for (j = filled, i = 0; i < cnt; i++)
-+ if (redundant[i])
-+ array[j++] = work[i];
- free(work);
- free(redundant);
- free(filled_index);
return filled;
}
@@ commit-reach.c: static int remove_redundant(struct repository *r, struct commit
+ }
+
+ /* clear marks */
-+ for (i = 0; i < walk_start_nr; i++)
-+ clear_commit_marks(walk_start[i], STALE);
++ clear_commit_marks_many(walk_start_nr, walk_start, STALE);
+ free(walk_start);
+
+ return count_non_stale;
3: 009f64b53c95 = 3: 1d3c5ebbb632 commit-reach: move compare_commits_by_gen
4: 83feabeebb5f = 4: c1fc174d42aa commit-reach: use heuristic in remove_redundant()
5: 14f0974c9872 = 5: 539db83bd735 commit-reach: stale commits may prune generation further
--
gitgitgadget
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/5] commit-reach: reduce requirements for remove_redundant()
2021-02-19 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget
@ 2021-02-19 12:34 ` Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-19 12:34 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Remove a comment at the beggining of remove_redundant() that mentions a
reordering of the input array to have the initial segment be the
independent commits and the final segment be the redundant commits.
While this behavior is followed in remove_redundant(), no callers rely
on that behavior.
Remove the final loop that copies this final segment and update the
comment to match the new behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index e38771ca5a1f..9af51fe7e078 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -160,9 +160,10 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
{
/*
* Some commit in the array may be an ancestor of
- * another commit. Move such commit to the end of
- * the array, and return the number of commits that
- * are independent from each other.
+ * another commit. Move the independent commits to the
+ * beginning of 'array' and return their number. Callers
+ * should not rely upon the contents of 'array' after
+ * that number.
*/
struct commit **work;
unsigned char *redundant;
@@ -209,9 +210,6 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
for (i = filled = 0; i < cnt; i++)
if (!redundant[i])
array[filled++] = work[i];
- for (j = filled, i = 0; i < cnt; i++)
- if (redundant[i])
- array[j++] = work[i];
free(work);
free(redundant);
free(filled_index);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 2/5] commit-reach: use one walk in remove_redundant()
2021-02-19 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-19 12:34 ` Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-19 12:34 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 104 +++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 96 insertions(+), 8 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 9af51fe7e078..7a3a1eb1a26e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -156,15 +156,9 @@ struct commit_list *get_octopus_merge_bases(struct commit_list *in)
return ret;
}
-static int remove_redundant(struct repository *r, struct commit **array, int cnt)
+static int remove_redundant_no_gen(struct repository *r,
+ struct commit **array, int cnt)
{
- /*
- * Some commit in the array may be an ancestor of
- * another commit. Move the independent commits to the
- * beginning of 'array' and return their number. Callers
- * should not rely upon the contents of 'array' after
- * that number.
- */
struct commit **work;
unsigned char *redundant;
int *filled_index;
@@ -216,6 +210,100 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
return filled;
}
+static int remove_redundant_with_gen(struct repository *r,
+ struct commit **array, int cnt)
+{
+ int i, count_non_stale = 0;
+ timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
+ struct commit **walk_start;
+ size_t walk_start_nr = 0, walk_start_alloc = cnt;
+ struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+
+ ALLOC_ARRAY(walk_start, walk_start_alloc);
+
+ /* Mark all parents of the input as STALE */
+ for (i = 0; i < cnt; i++) {
+ struct commit_list *parents;
+ timestamp_t generation;
+
+ repo_parse_commit(r, array[i]);
+ parents = array[i]->parents;
+
+ while (parents) {
+ repo_parse_commit(r, parents->item);
+ if (!(parents->item->object.flags & STALE)) {
+ parents->item->object.flags |= STALE;
+ ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc);
+ walk_start[walk_start_nr++] = parents->item;
+ prio_queue_put(&queue, parents->item);
+ }
+ parents = parents->next;
+ }
+
+ generation = commit_graph_generation(array[i]);
+
+ if (generation < min_generation)
+ min_generation = generation;
+ }
+
+ /* push the STALE bits up to min generation */
+ while (queue.nr) {
+ struct commit_list *parents;
+ struct commit *c = prio_queue_get(&queue);
+
+ repo_parse_commit(r, c);
+
+ if (commit_graph_generation(c) < min_generation)
+ continue;
+
+ parents = c->parents;
+ while (parents) {
+ if (!(parents->item->object.flags & STALE)) {
+ parents->item->object.flags |= STALE;
+ prio_queue_put(&queue, parents->item);
+ }
+ parents = parents->next;
+ }
+ }
+
+ /* rearrange array */
+ for (i = count_non_stale = 0; i < cnt; i++) {
+ if (!(array[i]->object.flags & STALE))
+ array[count_non_stale++] = array[i];
+ }
+
+ /* clear marks */
+ clear_commit_marks_many(walk_start_nr, walk_start, STALE);
+ free(walk_start);
+
+ return count_non_stale;
+}
+
+static int remove_redundant(struct repository *r, struct commit **array, int cnt)
+{
+ /*
+ * Some commit in the array may be an ancestor of
+ * another commit. Move the independent commits to the
+ * beginning of 'array' and return their number. Callers
+ * should not rely upon the contents of 'array' after
+ * that number.
+ */
+ if (generation_numbers_enabled(r)) {
+ int i;
+
+ /*
+ * If we have a single commit with finite generation
+ * number, then the _with_gen algorithm is preferred.
+ */
+ for (i = 0; i < cnt; i++) {
+ if (commit_graph_generation(array[i]) < GENERATION_NUMBER_INFINITY)
+ return remove_redundant_with_gen(r, array, cnt);
+ }
+ }
+
+ return remove_redundant_no_gen(r, array, cnt);
+}
+
static struct commit_list *get_merge_bases_many_0(struct repository *r,
struct commit *one,
int n,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 3/5] commit-reach: move compare_commits_by_gen
2021-02-19 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-19 12:34 ` Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget
4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-19 12:34 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Move this earlier in the file so it can be used by more methods.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 7a3a1eb1a26e..a34c5ba09e81 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -17,6 +17,21 @@
static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
+static int compare_commits_by_gen(const void *_a, const void *_b)
+{
+ const struct commit *a = *(const struct commit * const *)_a;
+ const struct commit *b = *(const struct commit * const *)_b;
+
+ timestamp_t generation_a = commit_graph_generation(a);
+ timestamp_t generation_b = commit_graph_generation(b);
+
+ if (generation_a < generation_b)
+ return -1;
+ if (generation_a > generation_b)
+ return 1;
+ return 0;
+}
+
static int queue_has_nonstale(struct prio_queue *queue)
{
int i;
@@ -647,21 +662,6 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
return repo_is_descendant_of(the_repository, commit, list);
}
-static int compare_commits_by_gen(const void *_a, const void *_b)
-{
- const struct commit *a = *(const struct commit * const *)_a;
- const struct commit *b = *(const struct commit * const *)_b;
-
- timestamp_t generation_a = commit_graph_generation(a);
- timestamp_t generation_b = commit_graph_generation(b);
-
- if (generation_a < generation_b)
- return -1;
- if (generation_a > generation_b)
- return 1;
- return 0;
-}
-
int can_all_from_reach_with_flag(struct object_array *from,
unsigned int with_flag,
unsigned int assign_flag,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 4/5] commit-reach: use heuristic in remove_redundant()
2021-02-19 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2021-02-19 12:34 ` [PATCH v3 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
@ 2021-02-19 12:34 ` Derrick Stolee via GitGitGadget
2021-02-19 12:34 ` [PATCH v3 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget
4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-19 12:34 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Reachability algorithms in commit-reach.c frequently benefit from using
the first-parent history as a heuristic for satisfying reachability
queries. The most obvious example was implemented in 4fbcca4e
(commit-reach: make can_all_from_reach... linear, 2018-07-20).
Update the walk in remove_redundant() to use this same heuristic. Here,
we are walking starting at the parents of the input commits. Sort those
parents and walk from the highest generation to lower. Each time, use
the heuristic of searching the first parent history before continuing to
expand the walk.
The order in which we explore the commits matters, so update
compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with
corrected commit dates computed, but it will assist when the commits are
in the region "above" the commit-graph with "infinite" generation
number. Note that we cannot shift to use
compare_commits_by_gen_then_commit_date as the method prototype is
different. We use compare_commits_by_gen for QSORT() as opposed to as a
priority function.
The important piece is to ensure we short-circuit the walk when we find
that there is a single non-redundant commit. This happens frequently
when looking for merge-bases or comparing several tags with 'git
merge-base --independent'. Use a new count 'count_still_independent' and
if that hits 1 we can stop walking.
To update 'count_still_independent' properly, we add use of the RESULT
flag on the input commits. Then we can detect when we reach one of these
commits and decrease the count. We need to remove the RESULT flag at
that moment because we might re-visit that commit when popping the
stack.
We use the STALE flag to mark parents that have been added to the new
walk_start list, but we need to clear that flag before we start walking
so those flags don't halt our depth-first-search walk.
On my copy of the Linux kernel repository, the performance of 'git
merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
seconds.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 72 +++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 56 insertions(+), 16 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index a34c5ba09e81..399f2a8673e0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -29,6 +29,10 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
return -1;
if (generation_a > generation_b)
return 1;
+ if (a->date < b->date)
+ return -1;
+ if (a->date > b->date)
+ return 1;
return 0;
}
@@ -228,11 +232,10 @@ static int remove_redundant_no_gen(struct repository *r,
static int remove_redundant_with_gen(struct repository *r,
struct commit **array, int cnt)
{
- int i, count_non_stale = 0;
+ int i, count_non_stale = 0, count_still_independent = cnt;
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
struct commit **walk_start;
size_t walk_start_nr = 0, walk_start_alloc = cnt;
- struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
ALLOC_ARRAY(walk_start, walk_start_alloc);
@@ -242,6 +245,7 @@ static int remove_redundant_with_gen(struct repository *r,
timestamp_t generation;
repo_parse_commit(r, array[i]);
+ array[i]->object.flags |= RESULT;
parents = array[i]->parents;
while (parents) {
@@ -250,7 +254,6 @@ static int remove_redundant_with_gen(struct repository *r,
parents->item->object.flags |= STALE;
ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc);
walk_start[walk_start_nr++] = parents->item;
- prio_queue_put(&queue, parents->item);
}
parents = parents->next;
}
@@ -261,26 +264,63 @@ static int remove_redundant_with_gen(struct repository *r,
min_generation = generation;
}
- /* push the STALE bits up to min generation */
- while (queue.nr) {
- struct commit_list *parents;
- struct commit *c = prio_queue_get(&queue);
+ QSORT(walk_start, walk_start_nr, compare_commits_by_gen);
- repo_parse_commit(r, c);
+ /* remove STALE bit for now to allow walking through parents */
+ for (i = 0; i < walk_start_nr; i++)
+ walk_start[i]->object.flags &= ~STALE;
- if (commit_graph_generation(c) < min_generation)
- continue;
+ /*
+ * Start walking from the highest generation. Hopefully, it will
+ * find all other items during the first-parent walk, and we can
+ * terminate early. Otherwise, we will do the same amount of work
+ * as before.
+ */
+ for (i = walk_start_nr - 1; i >= 0 && count_still_independent > 1; i--) {
+ /* push the STALE bits up to min generation */
+ struct commit_list *stack = NULL;
- parents = c->parents;
- while (parents) {
- if (!(parents->item->object.flags & STALE)) {
- parents->item->object.flags |= STALE;
- prio_queue_put(&queue, parents->item);
+ commit_list_insert(walk_start[i], &stack);
+ walk_start[i]->object.flags |= STALE;
+
+ while (stack) {
+ struct commit_list *parents;
+ struct commit *c = stack->item;
+
+ repo_parse_commit(r, c);
+
+ if (c->object.flags & RESULT) {
+ c->object.flags &= ~RESULT;
+ if (--count_still_independent <= 1)
+ break;
}
- parents = parents->next;
+
+ if (commit_graph_generation(c) < min_generation) {
+ pop_commit(&stack);
+ continue;
+ }
+
+ parents = c->parents;
+ while (parents) {
+ if (!(parents->item->object.flags & STALE)) {
+ parents->item->object.flags |= STALE;
+ commit_list_insert(parents->item, &stack);
+ break;
+ }
+ parents = parents->next;
+ }
+
+ /* pop if all parents have been visited already */
+ if (!parents)
+ pop_commit(&stack);
}
+ free_commit_list(stack);
}
+ /* clear result */
+ for (i = 0; i < cnt; i++)
+ array[i]->object.flags &= ~RESULT;
+
/* rearrange array */
for (i = count_non_stale = 0; i < cnt; i++) {
if (!(array[i]->object.flags & STALE))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 5/5] commit-reach: stale commits may prune generation further
2021-02-19 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2021-02-19 12:34 ` [PATCH v3 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
@ 2021-02-19 12:34 ` Derrick Stolee via GitGitGadget
4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-19 12:34 UTC (permalink / raw)
To: git
Cc: Michael Haggerty, me, peff, gitster, René Scharfe,
Derrick Stolee, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
commit-reach.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 399f2a8673e0..2ea84d3dc074 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -234,15 +234,27 @@ static int remove_redundant_with_gen(struct repository *r,
{
int i, count_non_stale = 0, count_still_independent = cnt;
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
- struct commit **walk_start;
+ struct commit **walk_start, **sorted;
size_t walk_start_nr = 0, walk_start_alloc = cnt;
+ int min_gen_pos = 0;
+
+ /*
+ * Sort the input by generation number, ascending. This allows
+ * us to increase the "min_generation" limit when we discover
+ * the commit with lowest generation is STALE. The index
+ * min_gen_pos points to the current position within 'array'
+ * that is not yet known to be STALE.
+ */
+ ALLOC_ARRAY(sorted, cnt);
+ COPY_ARRAY(sorted, array, cnt);
+ QSORT(sorted, cnt, compare_commits_by_gen);
+ min_generation = commit_graph_generation(sorted[0]);
ALLOC_ARRAY(walk_start, walk_start_alloc);
/* Mark all parents of the input as STALE */
for (i = 0; i < cnt; i++) {
struct commit_list *parents;
- timestamp_t generation;
repo_parse_commit(r, array[i]);
array[i]->object.flags |= RESULT;
@@ -257,11 +269,6 @@ static int remove_redundant_with_gen(struct repository *r,
}
parents = parents->next;
}
-
- generation = commit_graph_generation(array[i]);
-
- if (generation < min_generation)
- min_generation = generation;
}
QSORT(walk_start, walk_start_nr, compare_commits_by_gen);
@@ -293,6 +300,12 @@ static int remove_redundant_with_gen(struct repository *r,
c->object.flags &= ~RESULT;
if (--count_still_independent <= 1)
break;
+ if (oideq(&c->object.oid, &sorted[min_gen_pos]->object.oid)) {
+ while (min_gen_pos < cnt - 1 &&
+ (sorted[min_gen_pos]->object.flags & STALE))
+ min_gen_pos++;
+ min_generation = commit_graph_generation(sorted[min_gen_pos]);
+ }
}
if (commit_graph_generation(c) < min_generation) {
@@ -316,6 +329,7 @@ static int remove_redundant_with_gen(struct repository *r,
}
free_commit_list(stack);
}
+ free(sorted);
/* clear result */
for (i = 0; i < cnt; i++)
--
gitgitgadget
^ permalink raw reply related [flat|nested] 36+ messages in thread