git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/2] Accelerate "git merge-base --is-ancestor"
@ 2020-06-17 17:24 Derrick Stolee via GitGitGadget
  2020-06-17 17:24 ` [PATCH 1/2] commit-reach: create repo_is_descendant_of() Derrick Stolee via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-06-17 17:24 UTC (permalink / raw)
  To: git; +Cc: szeder.dev, avarab, abhishekkumar8222, me, Derrick Stolee

It was recently [1] reported (and not-so-recently [2] reported) that "git
merge-base --is-ancestor" can be pretty slow. In fact, it is regularly
slower than "git branch --contains" or "git tag --contains", which are
answering a "harder" query.

[1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/

[2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/

The root cause is that the in_merge_base() implementation is skipping the
fast can_all_from_reach() implementation and using paint_down_to_common()
instead. Note that these are equivalent: a commit A is in the set of
merge-bases between A and B if and only if B can reach A.

This fixes the issue, and makes the performance degradation reported by
Szeder a non-issue.

Thanks, -Stolee

Derrick Stolee (2):
  commit-reach: create repo_is_descendant_of()
  commit-reach: use fast logic in repo_in_merge_base

 commit-reach.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)


base-commit: b3d7a52fac39193503a0b6728771d1bf6a161464
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-664%2Fderrickstolee%2Fmerge-base-is-ancestor-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-664/derrickstolee/merge-base-is-ancestor-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/664
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] commit-reach: create repo_is_descendant_of()
  2020-06-17 17:24 [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Derrick Stolee via GitGitGadget
@ 2020-06-17 17:24 ` Derrick Stolee via GitGitGadget
  2020-06-29 13:40   ` Taylor Blau
  2020-06-17 17:24 ` [PATCH 2/2] commit-reach: use fast logic in repo_in_merge_base Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-06-17 17:24 UTC (permalink / raw)
  To: git
  Cc: szeder.dev, avarab, abhishekkumar8222, me, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The next change will make repo_in_merge_bases() depend on the logic in
is_descendant_of(), but we need to make the method independent of
the_repository first.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 4ca7e706a18..13722430aa5 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -283,7 +283,9 @@ struct commit_list *repo_get_merge_bases(struct repository *r,
 /*
  * Is "commit" a descendant of one of the elements on the "with_commit" list?
  */
-int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
+static int repo_is_descendant_of(struct repository *r,
+				 struct commit *commit,
+				 struct commit_list *with_commit)
 {
 	if (!with_commit)
 		return 1;
@@ -301,13 +303,18 @@ int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
 
 			other = with_commit->item;
 			with_commit = with_commit->next;
-			if (in_merge_bases(other, commit))
+			if (repo_in_merge_bases(r, other, commit))
 				return 1;
 		}
 		return 0;
 	}
 }
 
+int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
+{
+	return repo_is_descendant_of(the_repository, commit, with_commit);
+}
+
 /*
  * Is "commit" an ancestor of one of the "references"?
  */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] commit-reach: use fast logic in repo_in_merge_base
  2020-06-17 17:24 [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Derrick Stolee via GitGitGadget
  2020-06-17 17:24 ` [PATCH 1/2] commit-reach: create repo_is_descendant_of() Derrick Stolee via GitGitGadget
@ 2020-06-17 17:24 ` Derrick Stolee via GitGitGadget
  2020-06-17 20:46 ` [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Junio C Hamano
  2020-06-19  6:10 ` Abhishek Kumar
  3 siblings, 0 replies; 8+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-06-17 17:24 UTC (permalink / raw)
  To: git
  Cc: szeder.dev, avarab, abhishekkumar8222, me, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The repo_is_descendant_of() method is aware of the existence of the
commit-graph file. It checks for generation_numbers_enabled() before
deciding on using can_all_from_reach() or repo_in_merge_bases()
depending on the situation. The reason here is that can_all_from_reach()
uses a depth-first search that is limited by the minimum generation
number of the target commits, and that algorithm can be very slow when
generation numbers are not present. The alternative uses
paint_down_to_common() which will walk the entire merge-base boundary,
which is typically slower.

This method is used by commands like "git tag --contains" and "git
branch --contains" for very fast results when a commit-graph file
exists. Unfortunately, it is _not_ used in commands like "git merge-base
--is-ancestor" which is doing an even simpler request.

This issue was raised recently [1] with respect to a change to how
generation numbers are stored, but was also reported much earlier [2]
before commit-reach.c existed to simplify these reachability queries.

[1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/

The root cause is that builtin/merge-base.c has a method
handle_is_ancestor() that calls in_merge_bases(), an older version of
repo_in_merge_bases(). It would be better if we have every caller to
in_merge_bases() use the logic in can_all_from_reach() when possible.

This is where things get a little tricky: repo_is_descendant_of() calls
repo_in_merge_bases() in the non-generation numbers enabled case! If we
simply update repo_in_merge_bases() to call repo_is_descendant_of()
instead of repo_in_merge_bases_many(), then we will get a recursive call
loop. Thankfully, this is caught by the test suite in the default mode
(i.e. GIT_TEST_COMMIT_GRAPH=0).

The trick, then, is to make the non-generation number case for
repo_is_descendant_of() call repo_in_merge_bases_many() directly,
skipping the non-_many version. This allows us to take advantage of this
faster code path, when possible.

The easiest way to measure the performance impact is to test the
following command on the Linux kernel repository:

	git merge-base --is-ancestor <A> <B>

  | A    | B    | Time Before | Time After |
  |------|------|-------------|------------|
  | v3.0 | v5.7 | 0.459s      | 0.028s     |
  | v4.0 | v5.7 | 0.267s      | 0.021s     |
  | v5.0 | v5.7 | 0.074s      | 0.013s     |

Note that each of these samples return success. The old code performed
the same operation when <A> and <B> are swapped. However,
can_all_from_reach() will return immediately if the generation numbers
show that <A> has larger generation number than <B>. Thus, the time for
the swapped case is universally 0.004s in each case.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 13722430aa5..43e303d5f25 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -303,7 +303,7 @@ static int repo_is_descendant_of(struct repository *r,
 
 			other = with_commit->item;
 			with_commit = with_commit->next;
-			if (repo_in_merge_bases(r, other, commit))
+			if (repo_in_merge_bases_many(r, other, 1, &commit))
 				return 1;
 		}
 		return 0;
@@ -355,7 +355,15 @@ int repo_in_merge_bases(struct repository *r,
 			struct commit *commit,
 			struct commit *reference)
 {
-	return repo_in_merge_bases_many(r, commit, 1, &reference);
+	int res;
+	struct commit_list *list = NULL;
+	struct commit_list **next = &list;
+
+	next = commit_list_append(commit, next);
+	res = repo_is_descendant_of(r, reference, list);
+	free_commit_list(list);
+
+	return res;
 }
 
 struct commit_list *reduce_heads(struct commit_list *heads)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Accelerate "git merge-base --is-ancestor"
  2020-06-17 17:24 [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Derrick Stolee via GitGitGadget
  2020-06-17 17:24 ` [PATCH 1/2] commit-reach: create repo_is_descendant_of() Derrick Stolee via GitGitGadget
  2020-06-17 17:24 ` [PATCH 2/2] commit-reach: use fast logic in repo_in_merge_base Derrick Stolee via GitGitGadget
@ 2020-06-17 20:46 ` Junio C Hamano
  2020-06-18  1:37   ` Derrick Stolee
  2020-06-19  6:10 ` Abhishek Kumar
  3 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2020-06-17 20:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, szeder.dev, avarab, abhishekkumar8222, me, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> It was recently [1] reported (and not-so-recently [2] reported) that "git
> merge-base --is-ancestor" can be pretty slow. In fact, it is regularly
> slower than "git branch --contains" or "git tag --contains", which are
> answering a "harder" query.
>
> [1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
>
> [2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/
>
> The root cause is that the in_merge_base() implementation is skipping the
> fast can_all_from_reach() implementation and using paint_down_to_common()
> instead. Note that these are equivalent: a commit A is in the set of
> merge-bases between A and B if and only if B can reach A.

OK, so in short, this codepath was not taking advantage of the
generation numbers and that was the difference?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Accelerate "git merge-base --is-ancestor"
  2020-06-17 20:46 ` [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Junio C Hamano
@ 2020-06-18  1:37   ` Derrick Stolee
  0 siblings, 0 replies; 8+ messages in thread
From: Derrick Stolee @ 2020-06-18  1:37 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, szeder.dev, avarab, abhishekkumar8222, me, Derrick Stolee

On 6/17/2020 4:46 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> It was recently [1] reported (and not-so-recently [2] reported) that "git
>> merge-base --is-ancestor" can be pretty slow. In fact, it is regularly
>> slower than "git branch --contains" or "git tag --contains", which are
>> answering a "harder" query.
>>
>> [1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
>>
>> [2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/
>>
>> The root cause is that the in_merge_base() implementation is skipping the
>> fast can_all_from_reach() implementation and using paint_down_to_common()
>> instead. Note that these are equivalent: a commit A is in the set of
>> merge-bases between A and B if and only if B can reach A.
> 
> OK, so in short, this codepath was not taking advantage of the
> generation numbers and that was the difference?

Correct. We now use generation numbers more often, when
available. All callers of in_merge_bases() will become
faster, which includes callers in these builtins:

	- branch
	- fetch
	- log
	- merge-base
	- receive-pack

and these libgit files:

	- fast-import.c
	- http-push.c
	- merge-recursive.c (several!)
	- pack-bitmap-write.c

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Accelerate "git merge-base --is-ancestor"
  2020-06-17 17:24 [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-06-17 20:46 ` [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Junio C Hamano
@ 2020-06-19  6:10 ` Abhishek Kumar
  3 siblings, 0 replies; 8+ messages in thread
From: Abhishek Kumar @ 2020-06-19  6:10 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: dstolee, git, szeder.dev

On Wed, Jun 17, 2020 at 05:24:27PM +0000, Derrick Stolee via GitGitGadget wrote:
> It was recently [1] reported (and not-so-recently [2] reported) that "git
> merge-base --is-ancestor" can be pretty slow. In fact, it is regularly
> slower than "git branch --contains" or "git tag --contains", which are
> answering a "harder" query.
> 
> [1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
> 
> [2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/
> 
> The root cause is that the in_merge_base() implementation is skipping the
> fast can_all_from_reach() implementation and using paint_down_to_common()
> instead. Note that these are equivalent: a commit A is in the set of
> merge-bases between A and B if and only if B can reach A.
> 
> This fixes the issue, and makes the performance degradation reported by
> Szeder a non-issue.
> 
> Thanks, -Stolee
> 
> Derrick Stolee (2):
>   commit-reach: create repo_is_descendant_of()
>   commit-reach: use fast logic in repo_in_merge_base
> 
>  commit-reach.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> 
> base-commit: b3d7a52fac39193503a0b6728771d1bf6a161464
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-664%2Fderrickstolee%2Fmerge-base-is-ancestor-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-664/derrickstolee/merge-base-is-ancestor-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/664
> -- 
> gitgitgadget

Wow! Thanks for investigating through the issue and following up. The
performance numbers speak for themselves.

By applying this series on the commit-slab patch series, both are now
just as fast (master: 0.048s, commit-slab: 0.050s). 

Regards
Abhishek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] commit-reach: create repo_is_descendant_of()
  2020-06-17 17:24 ` [PATCH 1/2] commit-reach: create repo_is_descendant_of() Derrick Stolee via GitGitGadget
@ 2020-06-29 13:40   ` Taylor Blau
  2020-06-30  1:45     ` Derrick Stolee
  0 siblings, 1 reply; 8+ messages in thread
From: Taylor Blau @ 2020-06-29 13:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, szeder.dev, avarab, abhishekkumar8222, me, Derrick Stolee

On Wed, Jun 17, 2020 at 05:24:28PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The next change will make repo_in_merge_bases() depend on the logic in
> is_descendant_of(), but we need to make the method independent of
> the_repository first.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-reach.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index 4ca7e706a18..13722430aa5 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -283,7 +283,9 @@ struct commit_list *repo_get_merge_bases(struct repository *r,
>  /*
>   * Is "commit" a descendant of one of the elements on the "with_commit" list?
>   */
> -int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
> +static int repo_is_descendant_of(struct repository *r,
> +				 struct commit *commit,
> +				 struct commit_list *with_commit)
>  {
>  	if (!with_commit)
>  		return 1;
> @@ -301,13 +303,18 @@ int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
>
>  			other = with_commit->item;
>  			with_commit = with_commit->next;
> -			if (in_merge_bases(other, commit))
> +			if (repo_in_merge_bases(r, other, commit))
>  				return 1;
>  		}
>  		return 0;
>  	}
>  }
>
> +int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
> +{
> +	return repo_is_descendant_of(the_repository, commit, with_commit);
> +}
> +

I don't think that it makes a big deal either way, but I wonder about
moving 'repo_is_descendant_of' to the header file, and making
'is_descendant_of' be 'static inline int' as you defined it here.

Since this has already graduated up to master already, I don't think
that it's worth going back just to shuffle this code around, but I was
wondering if you had any specific reason for doing it this way.

>  /*
>   * Is "commit" an ancestor of one of the "references"?
>   */
> --
> gitgitgadget
>
Thanks,
Taylor

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] commit-reach: create repo_is_descendant_of()
  2020-06-29 13:40   ` Taylor Blau
@ 2020-06-30  1:45     ` Derrick Stolee
  0 siblings, 0 replies; 8+ messages in thread
From: Derrick Stolee @ 2020-06-30  1:45 UTC (permalink / raw)
  To: Taylor Blau, Derrick Stolee via GitGitGadget
  Cc: git, szeder.dev, avarab, abhishekkumar8222, Derrick Stolee

On 6/29/2020 9:40 AM, Taylor Blau wrote:
> On Wed, Jun 17, 2020 at 05:24:28PM +0000, Derrick Stolee via GitGitGadget wrote:
>> +int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
>> +{
>> +	return repo_is_descendant_of(the_repository, commit, with_commit);
>> +}
>> +
> 
> I don't think that it makes a big deal either way, but I wonder about
> moving 'repo_is_descendant_of' to the header file, and making
> 'is_descendant_of' be 'static inline int' as you defined it here.
> 
> Since this has already graduated up to master already, I don't think
> that it's worth going back just to shuffle this code around, but I was
> wondering if you had any specific reason for doing it this way.

I have good news for you. [1]

[1] https://lore.kernel.org/git/20200623184222.54201-1-carenas@gmail.com/

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-17 17:24 [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Derrick Stolee via GitGitGadget
2020-06-17 17:24 ` [PATCH 1/2] commit-reach: create repo_is_descendant_of() Derrick Stolee via GitGitGadget
2020-06-29 13:40   ` Taylor Blau
2020-06-30  1:45     ` Derrick Stolee
2020-06-17 17:24 ` [PATCH 2/2] commit-reach: use fast logic in repo_in_merge_base Derrick Stolee via GitGitGadget
2020-06-17 20:46 ` [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Junio C Hamano
2020-06-18  1:37   ` Derrick Stolee
2020-06-19  6:10 ` Abhishek Kumar

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git