git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] revision: trace topo-walk statistics
@ 2020-12-30  4:31 Derrick Stolee via GitGitGadget
  2021-01-07  1:37 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-12-30  4:31 UTC (permalink / raw)
  To: git; +Cc: me, Abhishek Kumar, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

We trace statistics about the effectiveness of changed-path Bloom
filters since 42e50e78 (revision.c: add trace2 stats around Bloom
filter usage, 2020-04-06). Add similar tracing for the topo-walk
algorithm that uses generation numbers to limit the walk size.

This information can help investigate and describe benefits to
heuristics and other changes.

The information that is printed is in JSON format and can be formatted
nicely to present as follows:

    {
	"count_explort_walked":2603,
	"count_indegree_walked":2603,
	"count_topo_walked":473
    }

Each of these values count the number of commits are visited by each of
the three "stages" of the topo-walk as detailed in b4542418 (revision.c:
generation-based topo-order algorithm, 2018-11-01).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    revision: trace topo-walk statistics
    
    Here is a patch I thought useful when investigating the performance
    implications of Abhishek's series [1]
    
    [1]
    https://lore.kernel.org/git/pull.676.v5.git.1609154168.gitgitgadget@gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-828%2Fderrickstolee%2Ftrace2-topo-walk-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-828/derrickstolee/trace2-topo-walk-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/828

 revision.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/revision.c b/revision.c
index 9dff845bed6..1bb590ece78 100644
--- a/revision.c
+++ b/revision.c
@@ -3308,6 +3308,26 @@ struct topo_walk_info {
 	struct author_date_slab author_date;
 };
 
+static int topo_walk_atexit_registered;
+static unsigned int count_explore_walked;
+static unsigned int count_indegree_walked;
+static unsigned int count_topo_walked;
+
+static void trace2_topo_walk_statistics_atexit(void)
+{
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	jw_object_intmax(&jw, "count_explore_walked", count_explore_walked);
+	jw_object_intmax(&jw, "count_indegree_walked", count_indegree_walked);
+	jw_object_intmax(&jw, "count_topo_walked", count_topo_walked);
+	jw_end(&jw);
+
+	trace2_data_json("topo_walk", the_repository, "statistics", &jw);
+
+	jw_release(&jw);
+}
+
 static inline void test_flag_and_insert(struct prio_queue *q, struct commit *c, int flag)
 {
 	if (c->object.flags & flag)
@@ -3329,6 +3349,8 @@ static void explore_walk_step(struct rev_info *revs)
 	if (repo_parse_commit_gently(revs->repo, c, 1) < 0)
 		return;
 
+	count_explore_walked++;
+
 	if (revs->sort_order == REV_SORT_BY_AUTHOR_DATE)
 		record_author_date(&info->author_date, c);
 
@@ -3367,6 +3389,8 @@ static void indegree_walk_step(struct rev_info *revs)
 	if (repo_parse_commit_gently(revs->repo, c, 1) < 0)
 		return;
 
+	count_indegree_walked++;
+
 	explore_to_depth(revs, commit_graph_generation(c));
 
 	for (p = c->parents; p; p = p->next) {
@@ -3476,6 +3500,11 @@ static void init_topo_walk(struct rev_info *revs)
 	 */
 	if (revs->sort_order == REV_SORT_IN_GRAPH_ORDER)
 		prio_queue_reverse(&info->topo_queue);
+
+	if (trace2_is_enabled() && !topo_walk_atexit_registered) {
+		atexit(trace2_topo_walk_statistics_atexit);
+		topo_walk_atexit_registered = 1;
+	}
 }
 
 static struct commit *next_topo_commit(struct rev_info *revs)
@@ -3502,6 +3531,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 			    oid_to_hex(&commit->object.oid));
 	}
 
+	count_topo_walked++;
+
 	for (p = commit->parents; p; p = p->next) {
 		struct commit *parent = p->item;
 		int *pi;

base-commit: 71ca53e8125e36efbda17293c50027d31681a41f
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] revision: trace topo-walk statistics
  2020-12-30  4:31 [PATCH] revision: trace topo-walk statistics Derrick Stolee via GitGitGadget
@ 2021-01-07  1:37 ` Junio C Hamano
  2021-01-07  2:04   ` Derrick Stolee
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2021-01-07  1:37 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, Abhishek Kumar, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/revision.c b/revision.c
> index 9dff845bed6..1bb590ece78 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3308,6 +3308,26 @@ struct topo_walk_info {
>  	struct author_date_slab author_date;
>  };
>  
> +static int topo_walk_atexit_registered;
> +static unsigned int count_explore_walked;
> +static unsigned int count_indegree_walked;
> +static unsigned int count_topo_walked;

The revision walk machinery is designed to be callable more than
once during the lifetime of a process.  These make readers wonder
if they should be defined in "struct rev_info" to allow stats
collected per traversal.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] revision: trace topo-walk statistics
  2021-01-07  1:37 ` Junio C Hamano
@ 2021-01-07  2:04   ` Derrick Stolee
  2021-01-07  2:29     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2021-01-07  2:04 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, Abhishek Kumar, Derrick Stolee, Derrick Stolee

On 1/6/2021 8:37 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/revision.c b/revision.c
>> index 9dff845bed6..1bb590ece78 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -3308,6 +3308,26 @@ struct topo_walk_info {
>>  	struct author_date_slab author_date;
>>  };
>>  
>> +static int topo_walk_atexit_registered;
>> +static unsigned int count_explore_walked;
>> +static unsigned int count_indegree_walked;
>> +static unsigned int count_topo_walked;
> 
> The revision walk machinery is designed to be callable more than
> once during the lifetime of a process.  These make readers wonder
> if they should be defined in "struct rev_info" to allow stats
> collected per traversal.

That's possible, but the use of an at-exit method means we only
report one set of statistics and the 'struct rev_info' might
be defunct.

It does limit how useful the statistics can be when there are
multiple 'struct rev_info's in use, but we also cannot trust
that the rev_infos are being cleaned up properly at the end
of the process to trigger the stats logging.

Of course, maybe that _is_ something we could guarantee, or
rather _should_ guarantee by patching any leaks. Seems like
a lot of work when these aggregate statistics will be
effective enough. But maybe I'm judging the potential work
too harshly?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] revision: trace topo-walk statistics
  2021-01-07  2:04   ` Derrick Stolee
@ 2021-01-07  2:29     ` Junio C Hamano
  2021-01-07 11:08       ` Derrick Stolee
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2021-01-07  2:29 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, Abhishek Kumar,
	Derrick Stolee, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> On 1/6/2021 8:37 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>>> diff --git a/revision.c b/revision.c
>>> index 9dff845bed6..1bb590ece78 100644
>>> --- a/revision.c
>>> +++ b/revision.c
>>> @@ -3308,6 +3308,26 @@ struct topo_walk_info {
>>>  	struct author_date_slab author_date;
>>>  };
>>>  
>>> +static int topo_walk_atexit_registered;
>>> +static unsigned int count_explore_walked;
>>> +static unsigned int count_indegree_walked;
>>> +static unsigned int count_topo_walked;
>> 
>> The revision walk machinery is designed to be callable more than
>> once during the lifetime of a process.  These make readers wonder
>> if they should be defined in "struct rev_info" to allow stats
>> collected per traversal.
>
> That's possible, but the use of an at-exit method means we only
> report one set of statistics and the 'struct rev_info' might
> be defunct.

Ah, sorry for the noise.  Even after making multiple traversal we
want to report the aggregate.

> It does limit how useful the statistics can be when there are
> multiple 'struct rev_info's in use, but we also cannot trust
> that the rev_infos are being cleaned up properly at the end
> of the process to trigger the stats logging.
>
> Of course, maybe that _is_ something we could guarantee, or
> rather _should_ guarantee by patching any leaks. Seems like
> a lot of work when these aggregate statistics will be
> effective enough. But maybe I'm judging the potential work
> too harshly?

But different subsystems would have different "per-invocation"
structure (e.g. "diff" uses "struct diff_options") and some may not
even have an appropriate structure to hang these stats on.  We may
want to design a more general mechanism that can be used to annotate
the subsystems uniformly.  While that could be a worthy longer term
goal, it certainly does not have to be part of this single-patch
topic, I would think.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] revision: trace topo-walk statistics
  2021-01-07  2:29     ` Junio C Hamano
@ 2021-01-07 11:08       ` Derrick Stolee
  0 siblings, 0 replies; 5+ messages in thread
From: Derrick Stolee @ 2021-01-07 11:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, Abhishek Kumar,
	Derrick Stolee, Derrick Stolee

On 1/6/2021 9:29 PM, Junio C Hamano wrote:
> But different subsystems would have different "per-invocation"
> structure (e.g. "diff" uses "struct diff_options") and some may not
> even have an appropriate structure to hang these stats on.  We may
> want to design a more general mechanism that can be used to annotate
> the subsystems uniformly.  While that could be a worthy longer term
> goal, it certainly does not have to be part of this single-patch
> topic, I would think.
 
I will give this further thought.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-07 11:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-30  4:31 [PATCH] revision: trace topo-walk statistics Derrick Stolee via GitGitGadget
2021-01-07  1:37 ` Junio C Hamano
2021-01-07  2:04   ` Derrick Stolee
2021-01-07  2:29     ` Junio C Hamano
2021-01-07 11:08       ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).