git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common()
Date: Tue, 1 May 2018 07:47:01 -0400	[thread overview]
Message-ID: <3e3ae7e4-9efd-3046-c0c2-7b3bf12d0d81@gmail.com> (raw)
In-Reply-To: <864ljsgwx1.fsf@gmail.com>

On 4/30/2018 6:19 PM, Jakub Narebski wrote:
> Derrick Stolee <dstolee@microsoft.com> writes:
>
>> When running 'git branch --contains', the in_merge_bases_many()
>> method calls paint_down_to_common() to discover if a specific
>> commit is reachable from a set of branches. Commits with lower
>> generation number are not needed to correctly answer the
>> containment query of in_merge_bases_many().
>>
>> Add a new parameter, min_generation, to paint_down_to_common() that
>> prevents walking commits with generation number strictly less than
>> min_generation. If 0 is given, then there is no functional change.
> This is thanks to the fact that generation numbers start at zero (as
> special case, though, with _ZERO), and we use strict inequality to avoid
> handling _ZERO etc. in a special way.
>
> As you wrote in response in previous version of this series, because
> paint_down_to_common() is file-local, there is no need to come up with
> symbolic name for GENERATION_NO_CUTOFF case.
>
> All right.
>
>> For in_merge_bases_many(), we can pass commit->generation as the
>> cutoff, and this saves time during 'git branch --contains' queries
>> that would otherwise walk "around" the commit we are inspecting.
> All right, and when using paint_down_to_common() to actually find merge
> bases, and not only check for containment, we cannot use cutoff.
> Therefore at least one call site needs to run it without functional
> change... which we can do.  Good.
>
>> For a copy of the Linux repository, where HEAD is checked out at
>> v4.13~100, we get the following performance improvement for
>> 'git branch --contains' over the previous commit:
>>
>> Before: 0.21s
>> After:  0.13s
>> Rel %: -38%
> Nice.
>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   commit.c | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
> Let me reorder chunks a bit to make it easier to review.
>
>> diff --git a/commit.c b/commit.c
>> index 7bb007f56a..e2e16ea1a7 100644
>> --- a/commit.c
>> +++ b/commit.c
>> @@ -1070,7 +1080,7 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit *
>>   	if (commit->generation > min_generation)
>>   		return ret;
>>   
>> -	bases = paint_down_to_common(commit, nr_reference, reference);
>> +	bases = paint_down_to_common(commit, nr_reference, reference, commit->generation);
>>   	if (commit->object.flags & PARENT2)
>>   		ret = 1;
>>   	clear_commit_marks(commit, all_flags);
>> @@ -808,11 +808,14 @@ static int queue_has_nonstale(struct prio_queue *queue)
>>   }
>>   
>>   /* all input commits in one and twos[] must have been parsed! */
>> -static struct commit_list *paint_down_to_common(struct commit *one, int n, struct commit **twos)
>> +static struct commit_list *paint_down_to_common(struct commit *one, int n,
>> +						struct commit **twos,
>> +						int min_generation)
>>   {
>>   	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
>>   	struct commit_list *result = NULL;
>>   	int i;
>> +	uint32_t last_gen = GENERATION_NUMBER_INFINITY;
>>   
>>   	one->object.flags |= PARENT1;
>>   	if (!n) {
>> @@ -831,6 +834,13 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc
>>   		struct commit_list *parents;
>>   		int flags;
>>   
>> +		if (commit->generation > last_gen)
>> +			BUG("bad generation skip");
>> +		last_gen = commit->generation;
> Shouldn't we provide more information about where the problem is to the
> user, to make it easier to debug the repository / commit-graph data?
>
> Good to have this sanity check here.

This BUG() _should_ only be seen by developers who add callers which do 
not load commits from the commit-graph file. There is a chance that 
there are cases not covered by this patch and the added tests, though. 
Hopefully we catch them all by dogfooding the feature before turning it 
on by default.

I can add the following to help debug these bad situations:

+			BUG("bad generation skip %d > %d at %s",
+			    commit->generation, last_gen,
+			    oid_to_hex(&commit->object.oid));



>
>> +
>> +		if (commit->generation < min_generation)
>> +			break;
> So the reasoning for this, as far as I understand, is the following.
> Please correct me if I am wrong.
>
> The callsite with non-zero min_generation, in_merge_bases_many(), tries
> to find out if "commit" is an ancestor of one of the "references".  At
> least one of "references" is above "commit", so in_merge_bases_many()
> uses paint_down_to_common() - but is interested only if "commit" was
> painted as reachable from one of "references".
>
> Thus we can interrupt the walk if we know that none of [considered]
> commits in the queue can reach "commit"/"one" - as if they were all
> STALE.
>
> The search is done using priority queue (a bit like in Dijkstra
> algorithm), with newer commits - with larger generation numbers -
> considered first.  Thus if current commit has generation number less
> than min_generation cutoff, i.e. if it is below "commit", then all
> remaining commits in the queue are below cutoff.
>
> Good.
>
>> +
>>   		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
>>   		if (flags == (PARENT1 | PARENT2)) {
>>   			if (!(commit->object.flags & RESULT)) {
>> @@ -879,7 +889,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
>>   			return NULL;
>>   	}
>>   
>> -	list = paint_down_to_common(one, n, twos);
>> +	list = paint_down_to_common(one, n, twos, 0);
> When calculating merge bases there is no such possibility of an early
> return due to generation number cutoff.  All right then.
>
>>   
>>   	while (list) {
>>   		struct commit *commit = pop_commit(&list);
>> @@ -946,7 +956,7 @@ static int remove_redundant(struct commit **array, int cnt)
>>   			filled_index[filled] = j;
>>   			work[filled++] = array[j];
>>   		}
>> -		common = paint_down_to_common(array[i], filled, work);
>> +		common = paint_down_to_common(array[i], filled, work, 0);
> Here we are interested not only if "one"/array[i] is reachable from
> "twos"/work, but also if "twos" is reachable from "one".  Simple cutoff
> only works in one way, though I wonder if we couldn't use cutoff being
> minimum generation number of "one" and "twos" together.
>
> But that may be left for a separate commit (after checking that the
> above is correct).
>
> Not as simple and obvious as paint_down_to_common() used in
> in_merge_bases_any(), so it is all right.

Thanks for reporting this. Since we are only concerned about 
reachability in this method, it is a good candidate to use 
min_generation. It is also subtle enough that we should leave it as a 
separate commit. Also, we can measure performance improvements 
separately, as I will mention in my commit message (but I'll copy it here):

     For a copy of the Linux repository, we measured the following
     performance improvements:

     git merge-base v3.3 v4.5

     Before: 234 ms
      After: 208 ms
      Rel %: -11%

     git merge-base v4.3 v4.5

     Before: 102 ms
      After:  83 ms
      Rel %: -19%

     The experiments above were chosen to demonstrate that we are
     improving the filtering of the merge-base set. In the first
     example, more time is spent walking the history to find the
     set of merge bases before the remove_redundant() call. The
     starting commits are closer together in the second example,
     therefore more time is spent in remove_redundant(). The relative
     change in performance differs as expected.


>
>>   		if (array[i]->object.flags & PARENT2)
>>   			redundant[i] = 1;
>>   		for (j = 0; j < filled; j++)


  reply	other threads:[~2018-05-01 11:47 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-03 16:51 [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 16:51 ` [PATCH 1/6] object.c: parse commit in graph first Derrick Stolee
2018-04-03 18:21   ` Jonathan Tan
2018-04-03 18:28     ` Jeff King
2018-04-03 18:32       ` Derrick Stolee
2018-04-03 16:51 ` [PATCH 2/6] commit: add generation number to struct commmit Derrick Stolee
2018-04-03 18:05   ` Brandon Williams
2018-04-03 18:28     ` Jeff King
2018-04-03 18:31       ` Derrick Stolee
2018-04-03 18:32       ` Brandon Williams
2018-04-03 18:44       ` Stefan Beller
2018-04-03 23:17       ` Ramsay Jones
2018-04-03 23:19         ` Jeff King
2018-04-03 18:24   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 3/6] commit-graph: compute generation numbers Derrick Stolee
2018-04-03 18:30   ` Jonathan Tan
2018-04-03 18:49     ` Stefan Beller
2018-04-03 16:51 ` [PATCH 4/6] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-03 18:31   ` Stefan Beller
2018-04-03 18:31   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 5/6] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-03 19:01   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 6/6] commit-graph.txt: update future work Derrick Stolee
2018-04-03 19:04   ` Jonathan Tan
2018-04-03 16:56 ` [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 18:03 ` Brandon Williams
2018-04-03 18:29   ` Derrick Stolee
2018-04-03 18:47     ` Jeff King
2018-04-03 19:05       ` Jeff King
2018-04-04 15:45         ` [PATCH 7/6] ref-filter: use generation number for --contains Derrick Stolee
2018-04-04 15:45           ` [PATCH 8/6] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-04 15:48             ` Derrick Stolee
2018-04-04 17:01               ` Brandon Williams
2018-04-04 18:24               ` Jeff King
2018-04-04 18:53                 ` Derrick Stolee
2018-04-04 18:59                   ` Jeff King
2018-04-04 18:22           ` [PATCH 7/6] ref-filter: use generation number for --contains Jeff King
2018-04-04 19:06             ` Derrick Stolee
2018-04-04 19:16               ` Jeff King
2018-04-04 19:22                 ` Derrick Stolee
2018-04-04 19:42                   ` Jeff King
2018-04-04 19:45                     ` Derrick Stolee
2018-04-04 19:46                       ` Jeff King
2018-04-07 17:09     ` [PATCH 0/6] Compute and consume generation numbers Jakub Narebski
2018-04-07 16:55 ` Jakub Narebski
2018-04-08  1:06   ` Derrick Stolee
2018-04-11 19:32     ` Jakub Narebski
2018-04-11 19:58       ` Derrick Stolee
2018-04-14 16:52         ` Jakub Narebski
2018-04-21 20:44           ` Jakub Narebski
2018-04-23 13:54             ` Derrick Stolee
2018-04-09 16:41 ` [PATCH v2 00/10] " Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 01/10] object.c: parse commit in graph first Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 02/10] merge: check config before loading commits Derrick Stolee
2018-04-11  2:12     ` Junio C Hamano
2018-04-11 12:49       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 03/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-09 17:59     ` Stefan Beller
2018-04-11  2:31     ` Junio C Hamano
2018-04-11 12:57       ` Derrick Stolee
2018-04-11 23:28         ` Junio C Hamano
2018-04-09 16:42   ` [PATCH v2 04/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-11  2:51     ` Junio C Hamano
2018-04-11 13:02       ` Derrick Stolee
2018-04-11 18:49         ` Stefan Beller
2018-04-11 19:26         ` Eric Sunshine
2018-04-09 16:42   ` [PATCH v2 05/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 06/10] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-11  3:02     ` Junio C Hamano
2018-04-11 13:24       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 07/10] commit-graph.txt: update future work Derrick Stolee
2018-04-12  9:12     ` Junio C Hamano
2018-04-12 11:35       ` Derrick Stolee
2018-04-13  9:53         ` Jakub Narebski
2018-04-09 16:42   ` [PATCH v2 08/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 09/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 10/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-17 17:00   ` [PATCH v3 0/9] Compute and consume generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 1/9] commit: add generation number to struct commmit Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 2/9] commit-graph: compute generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 3/9] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-18 14:31       ` Jakub Narebski
2018-04-18 14:46         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 4/9] commit-graph.txt: update design document Derrick Stolee
2018-04-18 19:47       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 5/9] ref-filter: use generation number for --contains Derrick Stolee
2018-04-18 21:02       ` Jakub Narebski
2018-04-23 14:22         ` Derrick Stolee
2018-04-24 18:56           ` Jakub Narebski
2018-04-25 14:11             ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 6/9] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-18 22:15       ` Jakub Narebski
2018-04-23 14:31         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 7/9] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-18 23:19       ` Jakub Narebski
2018-04-23 14:40         ` Derrick Stolee
2018-04-23 21:38           ` Jakub Narebski
2018-04-24 12:31             ` Derrick Stolee
2018-04-19  8:32       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 8/9] commit-graph: always load commit-graph information Derrick Stolee
2018-04-17 17:50       ` Derrick Stolee
2018-04-19  0:02       ` Jakub Narebski
2018-04-23 14:49         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 9/9] merge: check config before loading commits Derrick Stolee
2018-04-19  0:04     ` [PATCH v3 0/9] Compute and consume generation numbers Jakub Narebski
2018-04-23 14:54       ` Derrick Stolee
2018-04-25 14:37     ` [PATCH v4 00/10] " Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 01/10] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-04-28 17:54         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 02/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-28 22:35         ` Jakub Narebski
2018-04-30 12:05           ` Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 03/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-26  2:35         ` Junio C Hamano
2018-04-26 12:58           ` Derrick Stolee
2018-04-26 13:49             ` Derrick Stolee
2018-04-29  9:08         ` Jakub Narebski
2018-05-01 12:10           ` Derrick Stolee
2018-05-02 16:15             ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 04/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-26  3:22         ` Junio C Hamano
2018-04-26  9:02           ` Jakub Narebski
2018-04-28 14:38             ` Jakub Narebski
2018-04-29 15:40         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 05/10] commit-graph: always load commit-graph information Derrick Stolee
2018-04-29 22:14         ` Jakub Narebski
2018-05-01 12:19           ` Derrick Stolee
2018-04-29 22:18         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 06/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-30 16:34         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 07/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-30 17:05         ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-30 22:19         ` Jakub Narebski
2018-05-01 11:47           ` Derrick Stolee [this message]
2018-05-02 13:05             ` Jakub Narebski
2018-05-02 13:42               ` Derrick Stolee
2018-04-25 14:38       ` [PATCH v4 09/10] merge: check config before loading commits Derrick Stolee
2018-04-30 22:54         ` Jakub Narebski
2018-05-01 11:52           ` Derrick Stolee
2018-05-02 11:41             ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 10/10] commit-graph.txt: update design document Derrick Stolee
2018-04-30 23:32         ` Jakub Narebski
2018-05-01 12:00           ` Derrick Stolee
2018-05-02  7:57             ` Jakub Narebski
2018-04-25 14:40       ` [PATCH v4 00/10] Compute and consume generation numbers Derrick Stolee
2018-04-28 17:28         ` Jakub Narebski
2018-05-01 12:47       ` [PATCH v5 00/11] " Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 01/11] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 02/11] commit: add generation number to struct commmit Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 03/11] commit-graph: compute generation numbers Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 04/11] commit: use generations in paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 05/11] commit-graph: always load commit-graph information Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 06/11] ref-filter: use generation number for --contains Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 07/11] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 08/11] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 09/11] commit: use generation number in remove_redundant() Derrick Stolee
2018-05-01 15:37           ` Derrick Stolee
2018-05-03 18:45           ` Jakub Narebski
2018-05-01 12:47         ` [PATCH v5 10/11] merge: check config before loading commits Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 11/11] commit-graph.txt: update design document Derrick Stolee
2018-05-03 11:18         ` [PATCH v5 00/11] Compute and consume generation numbers Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e3ae7e4-9efd-3046-c0c2-7b3bf12d0d81@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).