git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Emily Shaffer <emilyshaffer@google.com>, git@vger.kernel.org
Cc: Josh Steadmon <steadmon@google.com>
Subject: Re: [PATCH] trace2: log progress time and throughput
Date: Fri, 15 May 2020 15:37:20 -0400	[thread overview]
Message-ID: <90f800d5-c647-6750-52d1-da973f036bd0@jeffhostetler.com> (raw)
In-Reply-To: <20200512214420.36329-1-emilyshaffer@google.com>



On 5/12/20 5:44 PM, Emily Shaffer wrote:
> Rather than teaching only one operation, like 'git fetch', how to write
> down throughput to traces, we can learn about a wide range of user
> operations that may seem slow by adding tooling to the progress library
> itself. Operations which display progress are likely to be slow-running
> and the kind of thing we want to monitor for performance anyways. By
> showing object counts and data transfer size, we should be able to
> make some derived measurements to ensure operations are scaling the way
> we expect.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>

We need to be careful here.  The region_enter and _leave calls need
to always be properly nested.  Having an implicit region within the
progress code means the code path between all of the different start_ 
and stop_progress calls need to not interleave with explicit regions.

How about putting something like this in stop_progress:

     struct json_writer jw = JSON_WRITER_INIT;
     jw_object_begin(&jw, 0);
     jw_object_intmax(&jw, "total", p->total);
     if (p->throughput)
         jw_object_intmax(&jw, "throughput", p->throughput->curr_total);
     /*
      * and so on...
      * and maybe include the elapsed time in since the start_progress.
      */
     jw_end(&jw);
     trace2_data_json("progress", NULL, p->title, &jw);
     jw_release(&jw);

That will give you a single record summary of the progress meter
and you won't have to worry about any interleaving.

You could also add a bit to `struct progress` to let you opt-in
or opt-out of the message on a case-by-case basis.

Jeff



> ---
> One note: by putting trace collection into the progress library, we end
> up with data events which have titles like "Receiving objects" - not
> very machine-parseable. An alternative might be to ask for a
> machine-readable title in the progress struct, but I didn't think it was
> worth the code churn. However, I don't have experience with processing
> the trace data after it's been collected, so if this is a bigger problem
> than I think, please say so and I'll figure something out.
> 
> CI run here, although it failed on the same error Junio noted today[1]:
> https://github.com/nasamuffin/git/runs/668457062
> 
>   - Emily
> 
> [1]: https://lore.kernel.org/git/xmqqtv0kc2q1.fsf@gitster.c.googlers.com
> 
>   progress.c                  | 17 +++++++++++++++++
>   t/t0500-progress-display.sh | 26 ++++++++++++++++++++++++++
>   2 files changed, 43 insertions(+)
> 
> diff --git a/progress.c b/progress.c
> index 75633e9c5e..6d2dcff0b6 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -265,6 +265,7 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
>   	progress->title_len = utf8_strwidth(title);
>   	progress->split = 0;
>   	set_progress_signal();
> +	trace2_region_enter("progress", title, the_repository);
>   	return progress;
>   }
>   
> @@ -320,6 +321,22 @@ void stop_progress(struct progress **p_progress)
>   {
>   	finish_if_sparse(*p_progress);
>   
> +	if (p_progress && *p_progress) {
> +		trace2_data_intmax("progress", the_repository, "total_objects",
> +				   (*p_progress)->total);
> +
> +		if ((*p_progress)->throughput)
> +			trace2_data_intmax("progress", the_repository,
> +					   "total_bytes",
> +					   (*p_progress)->throughput->curr_total);
> +	}
> +
> +	trace2_region_leave("progress",
> +			    p_progress && *p_progress
> +				? (*p_progress)->title
> +				: NULL,
> +			    the_repository);
> +
>   	stop_progress_msg(p_progress, _("done"));
>   }
>   
> diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
> index d2d088d9a0..1ed1df351c 100755
> --- a/t/t0500-progress-display.sh
> +++ b/t/t0500-progress-display.sh
> @@ -283,4 +283,30 @@ test_expect_success 'cover up after throughput shortens a lot' '
>   	test_i18ncmp expect out
>   '
>   
> +test_expect_success 'progress generates traces' '
> +	cat >in <<-\EOF &&
> +	throughput 102400 1000
> +	update
> +	progress 10
> +	throughput 204800 2000
> +	update
> +	progress 20
> +	throughput 307200 3000
> +	update
> +	progress 30
> +	throughput 409600 4000
> +	update
> +	progress 40
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace.event" test-tool progress --total=40 \
> +		"Working hard" <in 2>stderr &&
> +
> +	# t0212/parse_events.perl intentionally omits regions and data.
> +	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
> +	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
> +	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
> +	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
> +'
> +
>   test_done
> 

  parent reply	other threads:[~2020-05-15 19:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12 21:44 [PATCH] trace2: log progress time and throughput Emily Shaffer
2020-05-12 22:14 ` Junio C Hamano
2020-05-13 19:46 ` Josh Steadmon
2020-05-15 10:59 ` Derrick Stolee
2020-05-15 15:27   ` Junio C Hamano
2020-05-15 16:09     ` Junio C Hamano
2020-05-15 16:49       ` Derrick Stolee
2020-05-15 17:00         ` Junio C Hamano
2020-05-15 19:37 ` Jeff Hostetler [this message]
2020-05-15 19:44   ` Jeff Hostetler
2021-06-21  1:24 ` Ævar Arnfjörð Bjarmason
2021-06-21 13:55   ` Elijah Newren
2021-06-21 14:51     ` Ævar Arnfjörð Bjarmason
2021-06-21 20:28       ` Elijah Newren
2021-06-23  2:55   ` Taylor Blau
2021-06-23  3:29     ` Chris Torek
2021-06-23  3:42       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=90f800d5-c647-6750-52d1-da973f036bd0@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).