git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/9] Trace2 timers and counters and some cleanup
@ 2022-10-04 16:19 Jeff Hostetler via GitGitGadget
  2022-10-04 16:19 ` [PATCH 1/9] builtin/merge-file: fix compiler warning on MacOS with clang 11.0.0 Jeff Hostetler via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:19 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler

This patch series add stopwatch timers and global counters to the trace2
logging facility. It also does a little housecleaning.

This is basically a rewrite of the series that I submitted back in December
2021: [1] and [2]. Hopefully, it addresses all of the concerns raised back
then and does it in a way that avoids the issues that stalled that effort.

First we start with a few housecleaning commits:

 * The first 2 commits are unrelated to this effort, but were required to
   get the existing code to compile on my Mac with Clang 11.0.0 with
   DEVELOPER=1. Those can be dropped if there is a better way to do this.

 * The 3rd commit is in response a concern about using int rather than
   size_t for nr and alloc in an ALLOC_GROW() in existing trace2 code.

 * The 4th commit cleans up my use of the term "TLS" in my thread code.

 * The 5th and 6th commits (hopefully) clear up the misunderstandings around
   the thread_name variable in my thread context structures. My earlier
   attempts to clean and clarify this led to most of the controversies in
   the earlier patch series. Hopefully, these 2 commits will improve the
   clarify matters.

 * The 7th commit cleans up a mostly obsolete section in the trace2 API
   documentation.

Finally, the last 2 commits add the stopwatch timers and the global
counters.

[1]
https://lore.kernel.org/git/pull.1099.git.1640012469.gitgitgadget@gmail.com/
[2]
https://lore.kernel.org/git/pull.1099.v2.git.1640720202.gitgitgadget@gmail.com/

Jeff Hostetler (9):
  builtin/merge-file: fix compiler warning on MacOS with clang 11.0.0
  builtin/unpack-objects.c: fix compiler warning on MacOS with clang
    11.0.0
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  tr2tls: clarify TLS terminology
  trace2: rename trace2 thread_name argument as name_hint
  trace2: convert ctx.thread_name to flex array
  api-trace2.txt: elminate section describing the public trace2 API
  trace2: add stopwatch timers
  trace2: add global counter mechanism

 Documentation/technical/api-trace2.txt | 190 +++++++++++++++++--------
 Makefile                               |   2 +
 builtin/merge-file.c                   |   4 +-
 builtin/unpack-objects.c               |   2 +-
 t/helper/test-trace2.c                 | 187 ++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  95 +++++++++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               | 121 +++++++++++++++-
 trace2.h                               | 101 +++++++++++--
 trace2/tr2_ctr.c                       | 101 +++++++++++++
 trace2/tr2_ctr.h                       | 104 ++++++++++++++
 trace2/tr2_tgt.h                       |  14 ++
 trace2/tr2_tgt_event.c                 |  47 +++++-
 trace2/tr2_tgt_normal.c                |  39 +++++
 trace2/tr2_tgt_perf.c                  |  49 ++++++-
 trace2/tr2_tls.c                       |  43 +++---
 trace2/tr2_tls.h                       |  52 ++++---
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 ++++++++++++++++++
 19 files changed, 1366 insertions(+), 113 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h


base-commit: 3dcec76d9df911ed8321007b1d197c1a206dc164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1373%2Fjeffhostetler%2Ftrace2-stopwatch-v4-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1373/jeffhostetler/trace2-stopwatch-v4-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1373
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 1/9] builtin/merge-file: fix compiler warning on MacOS with clang 11.0.0
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:19 ` Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 2/9] builtin/unpack-objects.c: " Jeff Hostetler via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:19 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/merge-file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/merge-file.c b/builtin/merge-file.c
index c923bbf2abb..607c3d3f9e1 100644
--- a/builtin/merge-file.c
+++ b/builtin/merge-file.c
@@ -26,9 +26,9 @@ static int label_cb(const struct option *opt, const char *arg, int unset)
 int cmd_merge_file(int argc, const char **argv, const char *prefix)
 {
 	const char *names[3] = { 0 };
-	mmfile_t mmfs[3] = { 0 };
+	mmfile_t mmfs[3] = { { 0 } };
 	mmbuffer_t result = { 0 };
-	xmparam_t xmp = { 0 };
+	xmparam_t xmp = { { 0 } };
 	int ret = 0, i = 0, to_stdout = 0;
 	int quiet = 0;
 	struct option options[] = {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 2/9] builtin/unpack-objects.c: fix compiler warning on MacOS with clang 11.0.0
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  2022-10-04 16:19 ` [PATCH 1/9] builtin/merge-file: fix compiler warning on MacOS with clang 11.0.0 Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 3/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/unpack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 43789b8ef29..4b16f1592ba 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -385,7 +385,7 @@ static const void *feed_input_zstream(struct input_stream *in_stream,
 
 static void stream_blob(unsigned long size, unsigned nr)
 {
-	git_zstream zstream = { 0 };
+	git_zstream zstream = { { 0 } };
 	struct input_zstream_data data = { 0 };
 	struct input_stream in_stream = {
 		.read = feed_input_zstream,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 3/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  2022-10-04 16:19 ` [PATCH 1/9] builtin/merge-file: fix compiler warning on MacOS with clang 11.0.0 Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 2/9] builtin/unpack-objects.c: " Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 4/9] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..a90bd639d48 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -11,8 +11,8 @@
 struct tr2tls_thread_ctx {
 	struct strbuf thread_name;
 	uint64_t *array_us_start;
-	int alloc;
-	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
+	size_t alloc;
+	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 4/9] tr2tls: clarify TLS terminology
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-10-04 16:20 ` [PATCH 3/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 5/9] trace2: rename trace2 thread_name argument as name_hint Jeff Hostetler via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Reduce or eliminate use of the term "TLS" in the Trace2 code.

The term "TLS" has two popular meanings: "thread-local storage" and
"transport layer security".  In the Trace2 source, the term is associated
with the former.  There was concern on the mailing list about it refering
to the latter.

Update the source and documentation to eliminate the use of the "TLS" term
or replace it with the phrase "thread-local storage" to reduce ambiguity.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  8 ++++----
 trace2.c                               |  2 +-
 trace2.h                               | 10 +++++-----
 trace2/tr2_tls.c                       |  6 +++---
 trace2/tr2_tls.h                       | 18 +++++++++++-------
 5 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 2afa28bb5aa..431d424f9d5 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -685,8 +685,8 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_start"`::
 	This event is generated when a thread is started.  It is
-	generated from *within* the new thread's thread-proc (for TLS
-	reasons).
+	generated from *within* the new thread's thread-proc (because
+	it needs to access data in the thread's thread-local storage).
 +
 ------------
 {
@@ -698,7 +698,7 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_exit"`::
 	This event is generated when a thread exits.  It is generated
-	from *within* the thread's thread-proc (for TLS reasons).
+	from *within* the thread's thread-proc.
 +
 ------------
 {
@@ -1206,7 +1206,7 @@ worked on 508 items at offset 2032.  Thread "th04" worked on 508 items
 at offset 508.
 +
 This example also shows that thread names are assigned in a racy manner
-as each thread starts and allocates TLS storage.
+as each thread starts.
 
 Config (def param) Events::
 
diff --git a/trace2.c b/trace2.c
index 0c0a11e07d5..c1244e45ace 100644
--- a/trace2.c
+++ b/trace2.c
@@ -52,7 +52,7 @@ static struct tr2_tgt *tr2_tgt_builtins[] =
  * Force (rather than lazily) initialize any of the requested
  * builtin TRACE2 targets at startup (and before we've seen an
  * actual TRACE2 event call) so we can see if we need to setup
- * the TR2 and TLS machinery.
+ * private data structures and thread-local storage.
  *
  * Return the number of builtin targets enabled.
  */
diff --git a/trace2.h b/trace2.h
index 88d906ea830..af3c11694cc 100644
--- a/trace2.h
+++ b/trace2.h
@@ -73,8 +73,7 @@ void trace2_initialize_clock(void);
 /*
  * Initialize TRACE2 tracing facility if any of the builtin TRACE2
  * targets are enabled in the system config or the environment.
- * This includes setting up the Trace2 thread local storage (TLS).
- * Emits a 'version' message containing the version of git
+ * This emits a 'version' message containing the version of git
  * and the Trace2 protocol.
  *
  * This function should be called from `main()` as early as possible in
@@ -302,7 +301,8 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
 
 /*
  * Emit a 'thread_start' event.  This must be called from inside the
- * thread-proc to set up the trace2 TLS data for the thread.
+ * thread-proc to allow the thread to create its own thread-local
+ * storage.
  *
  * Thread names should be descriptive, like "preload_index".
  * Thread names will be decorated with an instance number automatically.
@@ -315,8 +315,8 @@ void trace2_thread_start_fl(const char *file, int line,
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
- * thread-proc to report thread-specific data and cleanup TLS data
- * for the thread.
+ * thread-proc so that the thread can access and clean up its
+ * thread-local storage.
  */
 void trace2_thread_exit_fl(const char *file, int line);
 
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..8d2182fbdbb 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -69,9 +69,9 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void)
 	ctx = pthread_getspecific(tr2tls_key);
 
 	/*
-	 * If the thread-proc did not call trace2_thread_start(), we won't
-	 * have any TLS data associated with the current thread.  Fix it
-	 * here and silently continue.
+	 * If the current thread's thread-proc did not call
+	 * trace2_thread_start(), then the thread will not have any
+	 * thread-local storage.  Create it now and silently continue.
 	 */
 	if (!ctx)
 		ctx = tr2tls_create_self("unknown", getnanotime() / 1000);
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index a90bd639d48..1297509fd23 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -3,6 +3,12 @@
 
 #include "strbuf.h"
 
+/*
+ * Notice: the term "TLS" refers to "thread-local storage" in the
+ * Trace2 source files.  This usage is borrowed from GCC and Windows.
+ * There is NO relation to "transport layer security".
+ */
+
 /*
  * Arbitry limit for thread names for column alignment.
  */
@@ -17,9 +23,7 @@ struct tr2tls_thread_ctx {
 };
 
 /*
- * Create TLS data for the current thread.  This gives us a place to
- * put per-thread data, such as thread start time, function nesting
- * and a per-thread label for our messages.
+ * Create thread-local storage for the current thread.
  *
  * We assume the first thread is "main".  Other threads are given
  * non-zero thread-ids to help distinguish messages from concurrent
@@ -35,7 +39,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start);
 
 /*
- * Get our TLS data.
+ * Get the thread-local storage pointer of the current thread.
  */
 struct tr2tls_thread_ctx *tr2tls_get_self(void);
 
@@ -45,7 +49,7 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void);
 int tr2tls_is_main_thread(void);
 
 /*
- * Free our TLS data.
+ * Free the current thread's thread-local storage.
  */
 void tr2tls_unset_self(void);
 
@@ -81,12 +85,12 @@ uint64_t tr2tls_region_elasped_self(uint64_t us);
 uint64_t tr2tls_absolute_elapsed(uint64_t us);
 
 /*
- * Initialize the tr2 TLS system.
+ * Initialize thread-local storage for Trace2.
  */
 void tr2tls_init(void);
 
 /*
- * Free all tr2 TLS resources.
+ * Free all Trace2 thread-local storage resources.
  */
 void tr2tls_release(void);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 5/9] trace2: rename trace2 thread_name argument as name_hint
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-10-04 16:20 ` [PATCH 4/9] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Rename the `thread_name` argument in `tr2tls_create_self()`
and `trace2_thread_start()` to be `name_hint` to make it clear
that the passed argument is a hint that will be used to create
the actual `struct tr2tls_thread_ctx.thread_name` variable.

This should make it clearer in the API that the trace2 layer
does not borrow the caller's string pointer/buffer, but rather
that it will use that hint in formatting the actual thread's
name.  Previous discussion on the mailing list indicated that
there was confusion about this point.

This commit does not change how the `thread_name` field is
allocated or stored within the `tr2tls_thread_ctx` structure.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  2 +-
 trace2.c                               |  6 +++---
 trace2.h                               | 11 ++++++-----
 trace2/tr2_tls.c                       |  4 ++--
 trace2/tr2_tls.h                       | 17 ++++++++++-------
 5 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 431d424f9d5..4fe2d6992ab 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -209,7 +209,7 @@ e.g: `void trace2_child_start(struct child_process *cmd)`.
 
 These messages are concerned with Git thread usage.
 
-e.g: `void trace2_thread_start(const char *thread_name)`.
+e.g: `void trace2_thread_start(const char *name_hint)`.
 
 === Region and Data Messages
 
diff --git a/trace2.c b/trace2.c
index c1244e45ace..c8e5acced2a 100644
--- a/trace2.c
+++ b/trace2.c
@@ -466,7 +466,7 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code)
 				file, line, us_elapsed_absolute, exec_id, code);
 }
 
-void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
+void trace2_thread_start_fl(const char *file, int line, const char *name_hint)
 {
 	struct tr2_tgt *tgt_j;
 	int j;
@@ -488,14 +488,14 @@ void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
 		 */
 		trace2_region_enter_printf_fl(file, line, NULL, NULL, NULL,
 					      "thread-proc on main: %s",
-					      thread_name);
+					      name_hint);
 		return;
 	}
 
 	us_now = getnanotime() / 1000;
 	us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
 
-	tr2tls_create_self(thread_name, us_now);
+	tr2tls_create_self(name_hint, us_now);
 
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_start_fl)
diff --git a/trace2.h b/trace2.h
index af3c11694cc..fe39dcb5849 100644
--- a/trace2.h
+++ b/trace2.h
@@ -304,14 +304,15 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
  * thread-proc to allow the thread to create its own thread-local
  * storage.
  *
- * Thread names should be descriptive, like "preload_index".
- * Thread names will be decorated with an instance number automatically.
+ * The thread name hint should be descriptive, like "preload_index" or
+ * taken from the thread-proc function.  A unique thread name will be
+ * created from the hint and the thread id automatically.
  */
 void trace2_thread_start_fl(const char *file, int line,
-			    const char *thread_name);
+			    const char *name_hint);
 
-#define trace2_thread_start(thread_name) \
-	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name))
+#define trace2_thread_start(name_hint) \
+	trace2_thread_start_fl(__FILE__, __LINE__, (name_hint))
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 8d2182fbdbb..39b41fd2487 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -31,7 +31,7 @@ void tr2tls_start_process_clock(void)
 	tr2tls_us_start_process = getnanotime() / 1000;
 }
 
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
@@ -50,7 +50,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	strbuf_init(&ctx->thread_name, 0);
 	if (ctx->thread_id)
 		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
+	strbuf_addstr(&ctx->thread_name, name_hint);
 	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
 		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
 
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 1297509fd23..f1ee58305d6 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -25,17 +25,20 @@ struct tr2tls_thread_ctx {
 /*
  * Create thread-local storage for the current thread.
  *
- * We assume the first thread is "main".  Other threads are given
- * non-zero thread-ids to help distinguish messages from concurrent
- * threads.
- *
- * Truncate the thread name if necessary to help with column alignment
- * in printf-style messages.
+ * The first thread in the process will have:
+ *     { .thread_id=0, .thread_name="main" }
+ * Subsequent threads are given a non-zero thread_id and a thread_name
+ * constructed from the id and a "name hint" (which is usually based
+ * upon the name of the thread-proc function).  For example:
+ *     { .thread_id=10, .thread_name="th10fsm-listen" }
+ * This helps to identify and distinguish messages from concurrent threads.
+ * The ctx.thread_name field is truncated if necessary to help with column
+ * alignment in printf-style messages.
  *
  * In this and all following functions the term "self" refers to the
  * current thread.
  */
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
 					     uint64_t us_thread_start);
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-10-04 16:20 ` [PATCH 5/9] trace2: rename trace2 thread_name argument as name_hint Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-05 11:14   ` Ævar Arnfjörð Bjarmason
  2022-10-05 18:03   ` Junio C Hamano
  2022-10-04 16:20 ` [PATCH 7/9] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
to a "flex array" at the end of the context structure.

The `thread_name` field is a constant string that is constructed when
the context is created.  Using a (non-const) `strbuf` structure for it
caused some confusion in the past because it implied that someone
could rename a thread after it was created.  That usage was not
intended.  Changing it to a "flex array" will hopefully make the
intent more clear.

Also, move the maximum thread_name truncation to tr2_tgt_perf.c
because it is the only target that needs to worry about output column
alignment.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  |  8 ++++++--
 trace2/tr2_tls.c       | 25 +++++++++++++------------
 trace2/tr2_tls.h       |  9 +--------
 4 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 37a3163be12..52f9356c695 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -90,7 +90,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 8cb792488c8..fdeb3292d3a 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -25,6 +25,7 @@ static int tr2env_perf_be_brief;
 
 #define TR2FMT_PERF_FL_WIDTH (28)
 #define TR2FMT_PERF_MAX_EVENT_NAME (12)
+#define TR2FMT_PERF_MAX_THREAD_NAME (24)
 #define TR2FMT_PERF_REPO_WIDTH (3)
 #define TR2FMT_PERF_CATEGORY_WIDTH (12)
 
@@ -107,8 +108,11 @@ static void perf_fmt_prepare(const char *event_name,
 	}
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
-	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
+	strbuf_addf(buf, "%-*.*s | %-*s | ",
+		    TR2FMT_PERF_MAX_THREAD_NAME,
+		    TR2FMT_PERF_MAX_THREAD_NAME,
+		    ctx->thread_name,
+		    TR2FMT_PERF_MAX_EVENT_NAME,
 		    event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 39b41fd2487..89437e773f6 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void)
 struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
 					     uint64_t us_thread_start)
 {
-	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
+	struct tr2tls_thread_ctx *ctx;
+	struct strbuf buf_name = STRBUF_INIT;
+	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
+
+	if (thread_id)
+		strbuf_addf(&buf_name, "th%02d:", thread_id);
+	strbuf_addstr(&buf_name, name_hint);
+
+	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
+	strbuf_release(&buf_name);
+
+	ctx->thread_id = thread_id;
 
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
@@ -45,15 +56,6 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
 
-	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
-
-	strbuf_init(&ctx->thread_name, 0);
-	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, name_hint);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
-
 	pthread_setspecific(tr2tls_key, ctx);
 
 	return ctx;
@@ -95,7 +97,6 @@ void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +114,7 @@ void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index f1ee58305d6..be0bc73d08f 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -9,17 +9,12 @@
  * There is NO relation to "transport layer security".
  */
 
-/*
- * Arbitry limit for thread names for column alignment.
- */
-#define TR2_MAX_THREAD_NAME (24)
-
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
 	uint64_t *array_us_start;
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+	char thread_name[FLEX_ARRAY];
 };
 
 /*
@@ -32,8 +27,6 @@ struct tr2tls_thread_ctx {
  * upon the name of the thread-proc function).  For example:
  *     { .thread_id=10, .thread_name="th10fsm-listen" }
  * This helps to identify and distinguish messages from concurrent threads.
- * The ctx.thread_name field is truncated if necessary to help with column
- * alignment in printf-style messages.
  *
  * In this and all following functions the term "self" refers to the
  * current thread.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 7/9] api-trace2.txt: elminate section describing the public trace2 API
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-10-04 16:20 ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 8/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Eliminate the mostly obsolete `Public API` sub-section from the
`Trace2 API` section in the documentation.  Strengthen the referral
to `trace2.h`.

Most of the technical information in this sub-section was moved to
`trace2.h` in 6c51cb525d (trace2: move doc to trace2.h, 2019-11-17) to
be adjacent to the function prototypes.  The remaining text wasn't
that useful by itself.

Furthermore, the text would need a bit of overhaul to add routines
that do not immediately generate a message, such as stopwatch timers.
So it seemed simpler to just get rid of it.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 61 +++-----------------------
 1 file changed, 7 insertions(+), 54 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 4fe2d6992ab..9d43909d068 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -148,20 +148,18 @@ filename collisions).
 
 == Trace2 API
 
-All public Trace2 functions and macros are defined in `trace2.h` and
-`trace2.c`.  All public symbols are prefixed with `trace2_`.
+The Trace2 public API is defined and documented in `trace2.h`; refer to it for
+more information.  All public functions and macros are prefixed
+with `trace2_` and are implemented in `trace2.c`.
 
 There are no public Trace2 data structures.
 
 The Trace2 code also defines a set of private functions and data types
 in the `trace2/` directory.  These symbols are prefixed with `tr2_`
-and should only be used by functions in `trace2.c`.
+and should only be used by functions in `trace2.c` (or other private
+source files in `trace2/`).
 
-== Conventions for Public Functions and Macros
-
-The functions defined by the Trace2 API are declared and documented
-in `trace2.h`.  It defines the API functions and wrapper macros for
-Trace2.
+=== Conventions for Public Functions and Macros
 
 Some functions have a `_fl()` suffix to indicate that they take `file`
 and `line-number` arguments.
@@ -172,52 +170,7 @@ take a `va_list` argument.
 Some functions have a `_printf_fl()` suffix to indicate that they also
 take a `printf()` style format with a variable number of arguments.
 
-There are CPP wrapper macros and `#ifdef`s to hide most of these details.
-See `trace2.h` for more details.  The following discussion will only
-describe the simplified forms.
-
-== Public API
-
-All Trace2 API functions send a message to all of the active
-Trace2 Targets.  This section describes the set of available
-messages.
-
-It helps to divide these functions into groups for discussion
-purposes.
-
-=== Basic Command Messages
-
-These are concerned with the lifetime of the overall git process.
-e.g: `void trace2_initialize_clock()`, `void trace2_initialize()`,
-`int trace2_is_enabled()`, `void trace2_cmd_start(int argc, const char **argv)`.
-
-=== Command Detail Messages
-
-These are concerned with describing the specific Git command
-after the command line, config, and environment are inspected.
-e.g: `void trace2_cmd_name(const char *name)`,
-`void trace2_cmd_mode(const char *mode)`.
-
-=== Child Process Messages
-
-These are concerned with the various spawned child processes,
-including shell scripts, git commands, editors, pagers, and hooks.
-
-e.g: `void trace2_child_start(struct child_process *cmd)`.
-
-=== Git Thread Messages
-
-These messages are concerned with Git thread usage.
-
-e.g: `void trace2_thread_start(const char *name_hint)`.
-
-=== Region and Data Messages
-
-These are concerned with recording performance data
-over regions or spans of code. e.g:
-`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`.
-
-Refer to trace2.h for details about all trace2 functions.
+CPP wrapper macros are defined to hide most of these details.
 
 == Trace2 Target Formats
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 8/9] trace2: add stopwatch timers
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-10-04 16:20 ` [PATCH 7/9] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-04 16:20 ` [PATCH 9/9] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add stopwatch timer mechanism to Trace2.

Timers are an alternative to Trace2 Regions.  Regions are useful for
measuring the time spent in various computation phases, such as the
time to read the index, time to scan for unstaged files, time to scan
for untracked files, and etc.

However, regions are not appropriate in all places.  For example,
during a checkout, it would be very inefficient to use regions to
measure the total time spent inflating objects from the ODB from
across the entire lifetime of the process; a per-unzip() region would
flood the output and significantly slow the command; and some form of
post-processing would be requried to compute the time spent in unzip().

Timers can be used to measure a series of timer intervals and emit
a single summary event (at thread and/or process exit).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  90 ++++++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  98 +++++++++++++
 t/t0211-trace2-perf.sh                 |  49 +++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               |  75 ++++++++++
 trace2.h                               |  43 ++++++
 trace2/tr2_tgt.h                       |   7 +
 trace2/tr2_tgt_event.c                 |  26 ++++
 trace2/tr2_tgt_normal.c                |  23 ++++
 trace2/tr2_tgt_perf.c                  |  24 ++++
 trace2/tr2_tls.c                       |  10 ++
 trace2/tr2_tls.h                       |  10 ++
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 +++++++++++++++++++
 15 files changed, 784 insertions(+)
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 9d43909d068..75ce6f45603 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -769,6 +769,42 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running in the thread.  This event is generated when a thread
+	exits for timers that requested per-thread events.
++
+------------
+{
+	"event":"th_timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
+`"timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running aggregated across all threads.  This event is generated
+	when the process exits.
++
+------------
+{
+	"event":"timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
@@ -1200,6 +1236,60 @@ d0 | main                     | data         | r0  |  0.002126 |  0.002126 | fsy
 d0 | main                     | exit         |     |  0.000470 |           |              | code:0
 d0 | main                     | atexit       |     |  0.000477 |           |              | code:0
 ----------------
+
+Stopwatch Timer Events::
+
+	Measure the time spent in a function call or span of code
+	that might be called from many places within the code
+	throughout the life of the process.
++
+----------------
+static void expensive_function(void)
+{
+	trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+	...
+	sleep_millisec(1000); // Do something expensive
+	...
+	trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+}
+
+static int ut_100timer(int argc, const char **argv)
+{
+	...
+
+	expensive_function();
+
+	// Do something else 1...
+
+	expensive_function();
+
+	// Do something else 2...
+
+	expensive_function();
+
+	return 0;
+}
+----------------
++
+In this example, we measure the total time spent in
+`expensive_function()` regardless of when it is called
+in the overall flow of the program.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ t/helper/test-tool trace2 100timer 3 1000
+...
+$ cat ~/log.perf
+d0 | main                     | version      |     |           |           |              | ...
+d0 | main                     | start        |     |  0.001453 |           |              | t/helper/test-tool trace2 100timer 3 1000
+d0 | main                     | cmd_name     |     |           |           |              | trace2 (trace2)
+d0 | main                     | exit         |     |  3.003667 |           |              | code:0
+d0 | main                     | timer        |     |           |           | test         | name:test1 intervals:3 total:3.001686 min:1.000254 max:1.000929
+d0 | main                     | atexit       |     |  3.003796 |           |              | code:0
+----------------
+
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index cac3452edb9..820649bf62a 100644
--- a/Makefile
+++ b/Makefile
@@ -1102,6 +1102,7 @@ LIB_OBJS += trace2/tr2_tgt_event.o
 LIB_OBJS += trace2/tr2_tgt_normal.o
 LIB_OBJS += trace2/tr2_tgt_perf.o
 LIB_OBJS += trace2/tr2_tls.o
+LIB_OBJS += trace2/tr2_tmr.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport-helper.o
 LIB_OBJS += transport.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index a714130ece7..f951b9e97d7 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -228,6 +228,101 @@ static int ut_010bug_BUG(int argc, const char **argv)
 	BUG("a %s message", "BUG");
 }
 
+/*
+ * Single-threaded timer test.  Create several intervals using the
+ * TEST1 timer.  The test script can verify that an aggregate Trace2
+ * "timer" event is emitted indicating that we started+stopped the
+ * timer the requested number of times.
+ */
+static int ut_100timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay>";
+
+	int count = 0;
+	int delay = 0;
+	int k;
+
+	if (argc != 2)
+		die("%s", usage_error);
+	if (get_i(&count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&delay, argv[1]))
+		die("%s", usage_error);
+
+	for (k = 0; k < count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+		sleep_millisec(delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+	}
+
+	return 0;
+}
+
+struct ut_101_data {
+	int count;
+	int delay;
+};
+
+static void *ut_101timer_thread_proc(void *_ut_101_data)
+{
+	struct ut_101_data *data = _ut_101_data;
+	int k;
+
+	trace2_thread_start("ut_101");
+
+	for (k = 0; k < data->count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
+		sleep_millisec(data->delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
+	}
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+/*
+ * Multi-threaded timer test.  Create several threads that each create
+ * several intervals using the TEST2 timer.  The test script can verify
+ * that an individual Trace2 "th_timer" events for each thread and an
+ * aggregate "timer" event are generated.
+ */
+static int ut_101timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay> <threads>";
+
+	struct ut_101_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.delay, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_101timer_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -248,6 +343,9 @@ static struct unit_test ut_table[] = {
 	{ ut_008bug,      "008bug",    "" },
 	{ ut_009bug_BUG,  "009bug_BUG","" },
 	{ ut_010bug_BUG,  "010bug_BUG","" },
+
+	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
+	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 22d0845544e..5c28424e657 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -173,4 +173,53 @@ test_expect_success 'using global config, perf stream, return code 0' '
 	test_cmp expect actual
 '
 
+# Exercise the stopwatch timers in a loop and confirm that we have
+# as many start/stop intervals as expected.  We cannot really test the
+# actual (total, min, max) timer values, so we have to assume that they
+# are good, but we can verify the interval count.
+#
+# The timer "test/test1" should only emit a global summary "timer" event.
+# The timer "test/test2" should emit per-thread "th_timer" events and a
+# global summary "timer" event.
+
+have_timer_event () {
+	thread=$1 event=$2 category=$3 name=$4 intervals=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} intervals:${intervals}" &&
+
+	grep "${pattern}" ${file}
+}
+
+test_expect_success 'stopwatch timer test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test1" 5 times from "main".
+	test-tool trace2 100timer 5 10 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_timer_event "main" "timer" "test" "test1" 5 actual
+'
+
+test_expect_success 'stopwatch timer test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test2" 5 times each in 3 threads.
+	test-tool trace2 101timer 5 10 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_timer_event "th01:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th02:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th03:ut_101" "th_timer" "test" "test2" 5 actual &&
+
+	# And we should have 15 total uses.
+	have_timer_event "main" "timer" "test" "test2" 15 actual
+'
+
 test_done
diff --git a/t/t0211/scrub_perf.perl b/t/t0211/scrub_perf.perl
index 299999f0f89..7a50bae6463 100644
--- a/t/t0211/scrub_perf.perl
+++ b/t/t0211/scrub_perf.perl
@@ -64,6 +64,12 @@ while (<>) {
 	    goto SKIP_LINE;
 	}
     }
+    elsif ($tokens[$col_event] =~ m/timer/) {
+	# This also captures "th_timer" events
+	$tokens[$col_rest] =~ s/ total:\d+\.\d*/ total:_T_TOTAL_/;
+	$tokens[$col_rest] =~ s/ min:\d+\.\d*/ min:_T_MIN_/;
+	$tokens[$col_rest] =~ s/ max:\d+\.\d*/ max:_T_MAX_/;
+    }
 
     # t_abs and t_rel are either blank or a float.  Replace the float
     # with a constant for matching the HEREDOC in the test script.
diff --git a/trace2.c b/trace2.c
index c8e5acced2a..c564ff49bb9 100644
--- a/trace2.c
+++ b/trace2.c
@@ -13,6 +13,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static int trace2_enabled;
 
@@ -83,6 +84,23 @@ static void tr2_tgt_disable_builtins(void)
 		tgt_j->pfn_term();
 }
 
+/*
+ * The signature of this function must match the pfn_timer
+ * method in the targets.  (Think of this is an apply operation
+ * across the set of active targets.)
+ */
+static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
+				 const struct tr2_timer *timer,
+				 int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tgt_j->pfn_timer(meta, timer, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -110,6 +128,26 @@ static void tr2main_atexit_handler(void)
 	 */
 	tr2tls_pop_unwind_self();
 
+	/*
+	 * Some timers want per-thread details.  If the main thread
+	 * used one of those timers, emit the details now (before
+	 * we emit the aggregate timer values).
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data for the main thread to the final
+	 * totals.  And then emit the final timer values.
+	 *
+	 * Technically, we shouldn't need to hold the lock to update
+	 * and output the final_timer_block (since all other threads
+	 * should be dead by now), but it doesn't hurt anything.
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -541,6 +579,21 @@ void trace2_thread_exit_fl(const char *file, int line)
 	tr2tls_pop_unwind_self();
 	us_elapsed_thread = tr2tls_region_elasped_self(us_now);
 
+	/*
+	 * Some timers want per-thread details.  If this thread used
+	 * one of those timers, emit the details now.
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data from the current (non-main) thread
+	 * to the final totals.  (We'll accumulate data for the main
+	 * thread later during "atexit".)
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_exit_fl)
 			tgt_j->pfn_thread_exit_fl(file, line,
@@ -795,6 +848,28 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...)
 	va_end(ap);
 }
 
+void trace2_timer_start(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_start: invalid timer id: %d", tid);
+
+	tr2_start_timer(tid);
+}
+
+void trace2_timer_stop(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_stop: invalid timer id: %d", tid);
+
+	tr2_stop_timer(tid);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index fe39dcb5849..2d146fb32fc 100644
--- a/trace2.h
+++ b/trace2.h
@@ -51,6 +51,7 @@ struct json_writer;
  * [] trace2_region*    -- emit region nesting messages.
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
+ * [] trace2_timer*     -- stopwatch timers (messages are deferred).
  */
 
 /*
@@ -485,6 +486,48 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...);
 
 #define trace2_printf(...) trace2_printf_fl(__FILE__, __LINE__, __VA_ARGS__)
 
+/*
+ * Define the set of stopwatch timers.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum must also be added to the
+ * `tr2_timer_metadata[]` in `trace2/tr2_tmr.c`.
+ */
+enum trace2_timer_id {
+	/*
+	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional timer definitions before here. */
+	TRACE2_NUMBER_OF_TIMERS
+};
+
+/*
+ * Start/Stop the indicated stopwatch timer in the current thread.
+ *
+ * The time spent by the current thread between the _start and _stop
+ * calls will be added to the thread's partial sum for this timer.
+ *
+ * Timer events are emitted at thread and program exit.
+ *
+ * Note: Since the stopwatch API routines do not generate individual
+ * events, they do not take (file, line) arguments.  Similarly, the
+ * category and timer name values are defined at compile-time in the
+ * timer definitions array, so they are not needed here in the API.
+ */
+void trace2_timer_start(enum trace2_timer_id tid);
+void trace2_timer_stop(enum trace2_timer_id tid);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 65f94e15748..2a80bef0df5 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -4,6 +4,8 @@
 struct child_process;
 struct repository;
 struct json_writer;
+struct tr2_timer_metadata;
+struct tr2_timer;
 
 /*
  * Function prototypes for a TRACE2 "target" vtable.
@@ -96,6 +98,10 @@ typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int line,
 					 uint64_t us_elapsed_absolute,
 					 const char *fmt, va_list ap);
 
+typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
+				  const struct tr2_timer *timer,
+				  int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -132,6 +138,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_fl_t                   *pfn_data_fl;
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
+	tr2_tgt_evt_timer_t                     *pfn_timer;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 52f9356c695..1196da89ba4 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -9,6 +9,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_event = {
 	.sysenv_var = TR2_SYSENV_EVENT,
@@ -617,6 +618,30 @@ static void fn_data_json_fl(const char *file, int line,
 	}
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_total = ((double)timer->total_ns) / 1000000000.0;
+	double t_min = ((double)timer->min_ns) / 1000000000.0;
+	double t_max = ((double)timer->max_ns) / 1000000000.0;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "intervals", timer->interval_count);
+	jw_object_double(&jw, "t_total", 6, t_total);
+	jw_object_double(&jw, "t_min", 6, t_min);
+	jw_object_double(&jw, "t_max", 6, t_max);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -648,4 +673,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 69f80330778..3888c10ef50 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -8,6 +8,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_normal = {
 	.sysenv_var = TR2_SYSENV_NORMAL,
@@ -329,6 +330,27 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = ((double)timer->total_ns) / 1000000000.0;
+	double t_min = ((double)timer->min_ns) / 1000000000.0;
+	double t_max = ((double)timer->max_ns) / 1000000000.0;
+
+	strbuf_addf(&buf_payload, ("%s %s/%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    event_name, meta->category, meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -360,4 +382,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_fl = NULL,
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index fdeb3292d3a..064aefbbebb 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -10,6 +10,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_perf = {
 	.sysenv_var = TR2_SYSENV_PERF,
@@ -559,6 +560,28 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = ((double)timer->total_ns) / 1000000000.0;
+	double t_min = ((double)timer->min_ns) / 1000000000.0;
+	double t_max = ((double)timer->max_ns) / 1000000000.0;
+
+	strbuf_addf(&buf_payload, ("name:%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -590,4 +613,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 89437e773f6..1aceb36b010 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -180,3 +180,13 @@ int tr2tls_locked_increment(int *p)
 
 	return current_value;
 }
+
+void tr2tls_lock(void)
+{
+	pthread_mutex_lock(&tr2tls_mutex);
+}
+
+void tr2tls_unlock(void)
+{
+	pthread_mutex_unlock(&tr2tls_mutex);
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index be0bc73d08f..4f8e24f1749 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_tmr.h"
 
 /*
  * Notice: the term "TLS" refers to "thread-local storage" in the
@@ -14,6 +15,9 @@ struct tr2tls_thread_ctx {
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+	struct tr2_timer_block timer_block;
+	unsigned int used_any_timer:1;
+	unsigned int used_any_per_thread_timer:1;
 	char thread_name[FLEX_ARRAY];
 };
 
@@ -100,4 +104,10 @@ int tr2tls_locked_increment(int *p);
  */
 void tr2tls_start_process_clock(void);
 
+/*
+ * Explicitly lock/unlock our mutex.
+ */
+void tr2tls_lock(void);
+void tr2tls_unlock(void);
+
 #endif /* TR2_TLS_H */
diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c
new file mode 100644
index 00000000000..786762dfd26
--- /dev/null
+++ b/trace2/tr2_tmr.c
@@ -0,0 +1,182 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
+
+#define MY_MAX(a, b) ((a) > (b) ? (a) : (b))
+#define MY_MIN(a, b) ((a) < (b) ? (a) : (b))
+
+/*
+ * A global timer block to aggregate values from the partial sums from
+ * each thread.
+ */
+static struct tr2_timer_block final_timer_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each stopwatch timer.
+ *
+ * This array must match "enum trace2_timer_id" and the values
+ * in "struct tr2_timer_block.timer[*]".
+ */
+static struct tr2_timer_metadata tr2_timer_metadata[TRACE2_NUMBER_OF_TIMERS] = {
+	[TRACE2_TIMER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_TIMER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_start_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+	t->recursion_count++;
+	if (t->recursion_count > 1)
+		return; /* ignore recursive starts */
+
+	t->start_ns = getnanotime();
+}
+
+void tr2_stop_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+	uint64_t ns_now;
+	uint64_t ns_interval;
+
+	assert(t->recursion_count > 0);
+
+	t->recursion_count--;
+	if (t->recursion_count)
+		return; /* still in recursive call(s) */
+
+	ns_now = getnanotime();
+	ns_interval = ns_now - t->start_ns;
+
+	t->total_ns += ns_interval;
+
+	/*
+	 * min_ns was initialized to zero (in the xcalloc()) rather
+	 * than UINT_MAX when the block of timers was allocated,
+	 * so we should always set both the min_ns and max_ns values
+	 * the first time that the timer is used.
+	 */
+	if (!t->interval_count) {
+		t->min_ns = ns_interval;
+		t->max_ns = ns_interval;
+	} else {
+		t->min_ns = MY_MIN(ns_interval, t->min_ns);
+		t->max_ns = MY_MAX(ns_interval, t->max_ns);
+	}
+
+	t->interval_count++;
+
+	ctx->used_any_timer = 1;
+	if (tr2_timer_metadata[tid].want_per_thread_events)
+		ctx->used_any_per_thread_timer = 1;
+}
+
+void tr2_update_final_timers(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_timer)
+		return;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		struct tr2_timer *t_final = &final_timer_block.timer[tid];
+		struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+		if (t->recursion_count) {
+			/*
+			 * The current thread is exiting with
+			 * timer[tid] still running.
+			 *
+			 * Technically, this is a bug, but I'm going
+			 * to ignore it.
+			 *
+			 * I don't think it is worth calling die()
+			 * for.  I don't think it is worth killing the
+			 * process for this bookkeeping error.  We
+			 * might want to call warning(), but I'm going
+			 * to wait on that.
+			 *
+			 * The downside here is that total_ns won't
+			 * include the current open interval (now -
+			 * start_ns).  I can live with that.
+			 */
+		}
+
+		if (!t->interval_count)
+			continue; /* this timer was not used by this thread */
+
+		t_final->total_ns += t->total_ns;
+
+		/*
+		 * final_timer_block.timer[tid].min_ns was initialized to
+		 * was initialized to zero rather than UINT_MAX, so we should
+		 * always set both the min_ns and max_ns values the first time
+		 * that we add a partial sum into it.
+		 */
+		if (!t_final->interval_count) {
+			t_final->min_ns = t->min_ns;
+			t_final->max_ns = t->max_ns;
+		} else {
+			t_final->min_ns = MY_MIN(t_final->min_ns, t->min_ns);
+			t_final->max_ns = MY_MAX(t_final->max_ns, t->max_ns);
+		}
+
+		t_final->interval_count += t->interval_count;
+	}
+}
+
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_per_thread_timer)
+		return;
+
+	/*
+	 * For each timer, if the timer wants per-thread events and
+	 * this thread used it, emit it.
+	 */
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (tr2_timer_metadata[tid].want_per_thread_events &&
+		    ctx->timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &ctx->timer_block.timer[tid],
+				 0);
+}
+
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	enum trace2_timer_id tid;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (final_timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &final_timer_block.timer[tid],
+				 1);
+}
diff --git a/trace2/tr2_tmr.h b/trace2/tr2_tmr.h
new file mode 100644
index 00000000000..d5753576134
--- /dev/null
+++ b/trace2/tr2_tmr.h
@@ -0,0 +1,140 @@
+#ifndef TR2_TMR_H
+#define TR2_TMR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow "stopwatch" timers.
+ *
+ * Timers can be used to measure "interesting" activity that does not
+ * fit the "region" model, such as code called from many different
+ * regions (like zlib) and/or where data for individual calls are not
+ * interesting or are too numerous to be efficiently logged.
+ *
+ * Timer values are accumulated during program execution and emitted
+ * to the Trace2 logs at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * timers and timer ids using a "timer block" array in thread-local
+ * storage.  This gives us constant time access to each timer within
+ * each thread, since we want start/stop operations to be as fast as
+ * possible.  This lets us avoid the complexities of dynamically
+ * allocating a timer on the first use by a thread and/or possibly
+ * sharing that timer definition with other concurrent threads.
+ * However, this does require that we define time the set of timers at
+ * compile time.
+ *
+ * Each thread uses the timer block in its thread-local storage to
+ * compute partial sums for each timer (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Using this "timer block" model costs ~48 bytes per timer per thread
+ * (we have about six uint64 fields per timer).  This does increase
+ * the size of the thread-local storage block, but it is allocated (at
+ * thread create time) and not on the thread stack, so I'm not worried
+ * about the size.
+ *
+ * Partial sums for each timer are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each timer are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "timer metadata" table contains the "category" and "name"
+ * fields for each timer.  This eliminates the need to include those
+ * args in the various timer APIs.
+ */
+
+/*
+ * The definition of an individual timer and used by an individual
+ * thread.
+ */
+struct tr2_timer {
+	/*
+	 * Total elapsed time for this timer in this thread in nanoseconds.
+	 */
+	uint64_t total_ns;
+
+	/*
+	 * The maximum and minimum interval values observed for this
+	 * timer in this thread.
+	 */
+	uint64_t min_ns;
+	uint64_t max_ns;
+
+	/*
+	 * The value of the clock when this timer was started in this
+	 * thread.  (Undefined when the timer is not active in this
+	 * thread.)
+	 */
+	uint64_t start_ns;
+
+	/*
+	 * Number of times that this timer has been started and stopped
+	 * in this thread.  (Recursive starts are ignored.)
+	 */
+	uint64_t interval_count;
+
+	/*
+	 * Number of nested starts on the stack in this thread.  (We
+	 * ignore recursive starts and use this to track the recursive
+	 * calls.)
+	 */
+	unsigned int recursion_count;
+};
+
+/*
+ * Metadata for a timer.
+ */
+struct tr2_timer_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this timer
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed-size block of timers to insert into
+ * thread-local storage.  This wrapper is used to avoid quirks
+ * of C and the usual need to pass an array size argument.
+ */
+struct tr2_timer_block {
+	struct tr2_timer timer[TRACE2_NUMBER_OF_TIMERS];
+};
+
+/*
+ * Private routines used by trace2.c to actually start/stop an
+ * individual timer in the current thread.
+ */
+void tr2_start_timer(enum trace2_timer_id tid);
+void tr2_stop_timer(enum trace2_timer_id tid);
+
+/*
+ * Add the current thread's timer data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_timers(void);
+
+/*
+ * Emit per-thread timer data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+/*
+ * Emit global total timer values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+#endif /* TR2_TMR_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 9/9] trace2: add global counter mechanism
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (7 preceding siblings ...)
  2022-10-04 16:20 ` [PATCH 8/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2022-10-04 16:20 ` Jeff Hostetler via GitGitGadget
  2022-10-05 13:04 ` [PATCH 0/9] Trace2 timers and counters and some cleanup Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-04 16:20 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add global counters mechanism to Trace2.

The Trace2 counters mechanism adds the ability to create a set of
global counter variables and an API to increment them efficiently.
Counters can optionally report per-thread usage in addition to the sum
across all threads.

Counter events are emitted to the Trace2 logs when a thread exits and
at process exit.

Counters are an alternative to `data` and `data_json` events.

Counters are useful when you want to measure something across the life
of the process, when you don't want per-measurement events for
performance reasons, when the data does not fit conveniently within a
region, or when your control flow does not easily let you write the
final total.  For example, you might use this to report the number of
calls to unzip() or the number of de-delta steps during a checkout.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  31 ++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  89 +++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  46 +++++++++++
 trace2.c                               |  52 +++++++++++--
 trace2.h                               |  37 +++++++++
 trace2/tr2_ctr.c                       | 101 ++++++++++++++++++++++++
 trace2/tr2_ctr.h                       | 104 +++++++++++++++++++++++++
 trace2/tr2_tgt.h                       |   7 ++
 trace2/tr2_tgt_event.c                 |  19 +++++
 trace2/tr2_tgt_normal.c                |  16 ++++
 trace2/tr2_tgt_perf.c                  |  17 ++++
 trace2/tr2_tls.h                       |   4 +
 13 files changed, 517 insertions(+), 7 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 75ce6f45603..de5fc250595 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -805,6 +805,37 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_counter"`::
+	This event logs the value of a counter variable in a thread.
+	This event is generated when a thread exits for counters that
+	requested per-thread events.
++
+------------
+{
+	"event":"th_counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+`"counter"`::
+	This event logs the value of a counter variable across all threads.
+	This event is generated when the process exits.  The total value
+	reported here is the sum across all threads.
++
+------------
+{
+	"event":"counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/Makefile b/Makefile
index 820649bf62a..29ab417ca3a 100644
--- a/Makefile
+++ b/Makefile
@@ -1094,6 +1094,7 @@ LIB_OBJS += trace.o
 LIB_OBJS += trace2.o
 LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
+LIB_OBJS += trace2/tr2_ctr.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
 LIB_OBJS += trace2/tr2_sysenv.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f951b9e97d7..1b092c60714 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -323,6 +323,92 @@ static int ut_101timer(int argc, const char **argv)
 	return 0;
 }
 
+/*
+ * Single-threaded counter test.  Add several values to the TEST1 counter.
+ * The test script can verify that the final sum is reported in the "counter"
+ * event.
+ */
+static int ut_200counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> [<v2> [...]]";
+	int value;
+	int k;
+
+	if (argc < 1)
+		die("%s", usage_error);
+
+	for (k = 0; k < argc; k++) {
+		if (get_i(&value, argv[k]))
+			die("invalid value[%s] -- %s",
+			    argv[k], usage_error);
+		trace2_counter_add(TRACE2_COUNTER_ID_TEST1, value);
+	}
+
+	return 0;
+}
+
+/*
+ * Multi-threaded counter test.  Create seveal threads that each increment
+ * the TEST2 global counter.  The test script can verify that an individual
+ * "th_counter" event is generated with a partial sum for each thread and
+ * that a final aggregate "counter" event is generated.
+ */
+
+struct ut_201_data {
+	int v1;
+	int v2;
+};
+
+static void *ut_201counter_thread_proc(void *_ut_201_data)
+{
+	struct ut_201_data *data = _ut_201_data;
+
+	trace2_thread_start("ut_201");
+
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v1);
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v2);
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+static int ut_201counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> <v2> <threads>";
+
+	struct ut_201_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.v1, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.v2, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_201counter_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -346,6 +432,9 @@ static struct unit_test ut_table[] = {
 
 	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
 	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
+
+	{ ut_200counter,  "200counter", "<v1> [<v2> [<v3> [...]]]" },
+	{ ut_201counter,  "201counter", "<v1> <v2> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 5c28424e657..0b3436e8cac 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -222,4 +222,50 @@ test_expect_success 'stopwatch timer test/test2' '
 	have_timer_event "main" "timer" "test" "test2" 15 actual
 '
 
+# Exercise the global counters and confirm that we get the expected values.
+#
+# The counter "test/test1" should only emit a global summary "counter" event.
+# The counter "test/test2" could emit per-thread "th_counter" events and a
+# global summary "counter" event.
+
+have_counter_event () {
+	thread=$1 event=$2 category=$3 name=$4 value=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} value:${value}" &&
+
+	grep "${patern}" ${file}
+}
+
+test_expect_success 'global counter test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the counter "test1" and add n integers.
+	test-tool trace2 200counter 1 2 3 4 5 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_counter_event "main" "counter" "test" "test1" 15 actual
+'
+
+test_expect_success 'global counter test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Add 2 integers to the counter "test2" in each of 3 threads.
+	test-tool trace2 201counter 7 13 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_counter_event "th01:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th02:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th03:ut_201" "th_counter" "test" "test2" 20 actual &&
+
+	# And we should have a single event with the total across all threads.
+	have_counter_event "main" "counter" "test" "test2" 60 actual
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index c564ff49bb9..2376500ea05 100644
--- a/trace2.c
+++ b/trace2.c
@@ -8,6 +8,7 @@
 #include "version.h"
 #include "trace2/tr2_cfg.h"
 #include "trace2/tr2_cmd_name.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
 #include "trace2/tr2_sysenv.h"
@@ -101,6 +102,22 @@ static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
 			tgt_j->pfn_timer(meta, timer, is_final_data);
 }
 
+/*
+ * The signature of this function must match the pfn_counter
+ * method in the targets.
+ */
+static void tr2_tgt_emit_a_counter(const struct tr2_counter_metadata *meta,
+				   const struct tr2_counter *counter,
+				   int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tgt_j->pfn_counter(meta, counter, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -132,20 +149,26 @@ static void tr2main_atexit_handler(void)
 	 * Some timers want per-thread details.  If the main thread
 	 * used one of those timers, emit the details now (before
 	 * we emit the aggregate timer values).
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data for the main thread to the final
-	 * totals.  And then emit the final timer values.
+	 * Add stopwatch timer and counter data for the main thread to
+	 * the final totals.  And then emit the final values.
 	 *
 	 * Technically, we shouldn't need to hold the lock to update
-	 * and output the final_timer_block (since all other threads
-	 * should be dead by now), but it doesn't hurt anything.
+	 * and output the final_timer_block and final_counter_block
+	 * (since all other threads should be dead by now), but it
+	 * doesn't hurt anything.
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_final_counters(tr2_tgt_emit_a_counter);
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -582,16 +605,20 @@ void trace2_thread_exit_fl(const char *file, int line)
 	/*
 	 * Some timers want per-thread details.  If this thread used
 	 * one of those timers, emit the details now.
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data from the current (non-main) thread
-	 * to the final totals.  (We'll accumulate data for the main
-	 * thread later during "atexit".)
+	 * Add stopwatch timer and counter data from the current
+	 * (non-main) thread to the final totals.  (We'll accumulate
+	 * data for the main thread later during "atexit".)
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -870,6 +897,17 @@ void trace2_timer_stop(enum trace2_timer_id tid)
 	tr2_stop_timer(tid);
 }
 
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS)
+		BUG("trace2_counter_add: invalid counter id: %d", cid);
+
+	tr2_counter_increment(cid, value);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index 2d146fb32fc..da670ffd26c 100644
--- a/trace2.h
+++ b/trace2.h
@@ -52,6 +52,7 @@ struct json_writer;
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
  * [] trace2_timer*     -- stopwatch timers (messages are deferred).
+ * [] trace2_counter*   -- global counters (messages are deferred).
  */
 
 /*
@@ -528,6 +529,42 @@ enum trace2_timer_id {
 void trace2_timer_start(enum trace2_timer_id tid);
 void trace2_timer_stop(enum trace2_timer_id tid);
 
+/*
+ * Define the set of global counters.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum be also be added to the
+ * `tr2_counter_metadata[]` in `trace2/tr2_tr2_ctr.c`.
+ */
+enum trace2_counter_id {
+	/*
+	 * Define two counters for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional counter definitions before here. */
+	TRACE2_NUMBER_OF_COUNTERS
+};
+
+/*
+ * Increase the named global counter by value.
+ *
+ * Note that this adds `value` to the current thread's partial sum for
+ * this counter (without locking) and that the complete sum is not
+ * available until all threads have exited, so it does not return the
+ * new value of the counter.
+ */
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
new file mode 100644
index 00000000000..483ca7c308f
--- /dev/null
+++ b/trace2/tr2_ctr.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_ctr.h"
+
+/*
+ * A global counter block to aggregrate values from the partial sums
+ * from each thread.
+ */
+static struct tr2_counter_block final_counter_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each global counter.
+ *
+ * This array must match the "enum trace2_counter_id" and the values
+ * in "struct tr2_counter_block.counter[*]".
+ */
+static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTERS] = {
+	[TRACE2_COUNTER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_COUNTER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+	c->value += value;
+
+	ctx->used_any_counter = 1;
+	if (tr2_counter_metadata[cid].want_per_thread_events)
+		ctx->used_any_per_thread_counter = 1;
+}
+
+void tr2_update_final_counters(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_counter)
+		return;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		struct tr2_counter *c_final = &final_counter_block.counter[cid];
+		const struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+		c_final->value += c->value;
+	}
+}
+
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_per_thread_counter)
+		return;
+
+	/*
+	 * For each counter, if the counter wants per-thread events
+	 * and this thread used it (the value is non-zero), emit it.
+	 */
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (tr2_counter_metadata[cid].want_per_thread_events &&
+		    ctx->counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &ctx->counter_block.counter[cid],
+				 0);
+}
+
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	enum trace2_counter_id cid;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (final_counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &final_counter_block.counter[cid],
+				 1);
+}
diff --git a/trace2/tr2_ctr.h b/trace2/tr2_ctr.h
new file mode 100644
index 00000000000..a2267ee9901
--- /dev/null
+++ b/trace2/tr2_ctr.h
@@ -0,0 +1,104 @@
+#ifndef TR2_CTR_H
+#define TR2_CTR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow global "counters".
+ *
+ * Counters can be used count interesting activity that does not fit
+ * the "region and data" model, such as code called from many
+ * different regions and/or where you want to count a number of items,
+ * but don't have control of when the last item will be processed,
+ * such as counter the number of calls to `lstat()`.
+ *
+ * Counters differ from Trace2 "data" events.  Data events are emitted
+ * immediately and are appropriate for documenting loop counters at
+ * the end of a region, for example.  Counter values are accumulated
+ * during the program and final counter values are emitted at program
+ * exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * counters and counter ids using a fixed size "counter block" array
+ * in thread-local storage.  This gives us constant time, lock-free
+ * access to each counter within each thread.  This lets us avoid the
+ * complexities of dynamically allocating a counter and sharing that
+ * definition with other threads.
+ *
+ * Each thread uses the counter block in its thread-local storage to
+ * increment partial sums for each counter (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Partial sums for each counter are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each counter are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "counter metadata" table contains the "category" and
+ * "name" fields for each counter.  This eliminates the need to
+ * include those args in the various counter APIs.
+ */
+
+/*
+ * The definition of an individual counter as used by an individual
+ * thread (and later in aggregation).
+ */
+struct tr2_counter {
+	uint64_t value;
+};
+
+/*
+ * Metadata for a counter.
+ */
+struct tr2_counter_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this counter
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed block of counters to insert into thread-local
+ * storage.  This wrapper is used to avoid quirks of C and the usual
+ * need to pass an array size argument.
+ */
+struct tr2_counter_block {
+	struct tr2_counter counter[TRACE2_NUMBER_OF_COUNTERS];
+};
+
+/*
+ * Private routines used by trace2.c to increment a counter for the
+ * current thread.
+ */
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value);
+
+/*
+ * Add the current thread's counter data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_counters(void);
+
+/*
+ * Emit per-thread counter data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+/*
+ * Emit global counter values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+#endif /* TR2_CTR_H */
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 2a80bef0df5..94a334d980a 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -6,6 +6,8 @@ struct repository;
 struct json_writer;
 struct tr2_timer_metadata;
 struct tr2_timer;
+struct tr2_counter_metadata;
+struct tr2_counter;
 
 /*
  * Function prototypes for a TRACE2 "target" vtable.
@@ -102,6 +104,10 @@ typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
 				  const struct tr2_timer *timer,
 				  int is_final_data);
 
+typedef void(tr2_tgt_evt_counter_t)(const struct tr2_counter_metadata *meta,
+				    const struct tr2_counter *counter,
+				    int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -139,6 +145,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
 	tr2_tgt_evt_timer_t                     *pfn_timer;
+	tr2_tgt_evt_counter_t                   *pfn_counter;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 1196da89ba4..bb0653e0e6f 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -642,6 +642,24 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	jw_release(&jw);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "count", counter->value);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -674,4 +692,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 3888c10ef50..b21508e06f7 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -351,6 +351,21 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "%s %s/%s value:%"PRIu64,
+		    event_name, meta->category, meta->name,
+		    counter->value);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -383,4 +398,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 064aefbbebb..cbf8aefd56c 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -582,6 +582,22 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "name:%s value:%"PRIu64,
+		    meta->name,
+		    counter->value);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -614,4 +630,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 4f8e24f1749..e306c9bf3ec 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tmr.h"
 
 /*
@@ -16,8 +17,11 @@ struct tr2tls_thread_ctx {
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 	struct tr2_timer_block timer_block;
+	struct tr2_counter_block counter_block;
 	unsigned int used_any_timer:1;
 	unsigned int used_any_per_thread_timer:1;
+	unsigned int used_any_counter:1;
+	unsigned int used_any_per_thread_counter:1;
 	char thread_name[FLEX_ARRAY];
 };
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-04 16:20 ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler via GitGitGadget
@ 2022-10-05 11:14   ` Ævar Arnfjörð Bjarmason
  2022-10-06 16:28     ` Jeff Hostetler
  2022-10-10 18:31     ` Jeff Hostetler
  2022-10-05 18:03   ` Junio C Hamano
  1 sibling, 2 replies; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-05 11:14 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler


On Tue, Oct 04 2022, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
> to a "flex array" at the end of the context structure.
>
> The `thread_name` field is a constant string that is constructed when
> the context is created.  Using a (non-const) `strbuf` structure for it
> caused some confusion in the past because it implied that someone
> could rename a thread after it was created.

I think it's been long enough that we could use a reminder about the
"some confusion", i.e. if it was a bug report or something else.

> That usage was not intended.  Changing it to a "flex array" will
> hopefully make the intent more clear.

I see we had some back & forth back in the original submission, although
honestly I skimmed this this time around, had forgetten about that, and
had this pop out at me, and then found my earlier comments.

I see that exchange didn't end as well as I'd hoped[1], and hopefully we
can avoid that here. So having looked at this with fresh eyes maybe
these comments/questions help:

 * I'm unable to bridge the cap from (paraphrased) "we must change the
   type" to "mak[ing] the [read-only] intent more clear".

   I.e. if you go across the codebase and look at various non-const
   "char name[FLEX_ARRAY]" and add a "const" to them you'll find cases
   where we re-write the "FLEX_ARRAY" string, e.g. the one in archive.c
   is one of those (the first grep hit, I stopped looking for others at
   that point).

   Making it "const" will yield:
   
      archive.c: In function ‘queue_directory’:
   archive.c:206:29: error: passing argument 1 of ‘xsnprintf’ discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
     206 |         d->len = xsnprintf(d->path, len, "%.*s%s/", (int)base->len, base->buf, filename);

   So aside from anything else (and I may be misunderstanding this) why
   does changing it to a FLEX_ARRAY give us the connotation in the
   confused API user's mind that it shouldn't be messed with that the
   "strbuf" doesn't give us?

 * Now, quoting from the below:

> [...]
> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 39b41fd2487..89437e773f6 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void)
>  struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
>  					     uint64_t us_thread_start)
>  {
> -	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> +	struct tr2tls_thread_ctx *ctx;
> +	struct strbuf buf_name = STRBUF_INIT;
> +	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
> +
> +	if (thread_id)
> +		strbuf_addf(&buf_name, "th%02d:", thread_id);
> +	strbuf_addstr(&buf_name, name_hint);
> +
> +	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
> +	strbuf_release(&buf_name);
> +
> +	ctx->thread_id = thread_id;
>  
>  	/*
>  	 * Implicitly "tr2tls_push_self()" to capture the thread's start
> @@ -45,15 +56,6 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
>  	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
>  	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
>  
> -	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
> -
> -	strbuf_init(&ctx->thread_name, 0);
> -	if (ctx->thread_id)
> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
> -	strbuf_addstr(&ctx->thread_name, name_hint);
> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
> -
>  	pthread_setspecific(tr2tls_key, ctx);
>  
>  	return ctx;

I found this quote hard to follow because there's functional changes
there mixed up with code re-arangement, consider leading with a commit
like:
	
	diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
	index 39b41fd2487..d7952062007 100644
	--- a/trace2/tr2_tls.c
	+++ b/trace2/tr2_tls.c
	@@ -31,10 +31,24 @@ void tr2tls_start_process_clock(void)
	 	tr2tls_us_start_process = getnanotime() / 1000;
	 }
	 
	+static void fill_thread_name(struct strbuf *buf, const char *name_hint,
	+			     int thread_id)
	+{
	+	if (thread_id)
	+		strbuf_addf(buf, "th%02d:", thread_id);
	+	strbuf_addstr(buf, name_hint);
	+	if (buf->len > TR2_MAX_THREAD_NAME)
	+		strbuf_setlen(buf, TR2_MAX_THREAD_NAME);
	+
	+}
	+
	 struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
	 					     uint64_t us_thread_start)
	 {
	-	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
	+	struct tr2tls_thread_ctx *ctx;
	+	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
	+
	+	ctx = xcalloc(1, sizeof(*ctx));
	 
	 	/*
	 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
	@@ -45,14 +59,8 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
	 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
	 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
	 
	-	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
	-
	 	strbuf_init(&ctx->thread_name, 0);
	-	if (ctx->thread_id)
	-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
	-	strbuf_addstr(&ctx->thread_name, name_hint);
	-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
	-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
	+	fill_thread_name(&ctx->thread_name, name_hint, thread_id);
	 
	 	pthread_setspecific(tr2tls_key, ctx);

I see from [1] that I comment on that before, i.e. that it was
"looks-to-be-unrelated", hopefully the above clarifies that, i.e. that
it's "unrelated" in the sense that we can do it separately with no
functiontal change, making the real change smaller.

If I then rebase your change on top of that I get the below diff, which
IMO makes it much clearer what's going on. Commenting on that:
	
	diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
	index d7952062007..c540027e75d 100644
	--- a/trace2/tr2_tls.c
	+++ b/trace2/tr2_tls.c
	@@ -37,18 +37,21 @@ static void fill_thread_name(struct strbuf *buf, const char *name_hint,
	 	if (thread_id)
	 		strbuf_addf(buf, "th%02d:", thread_id);
	 	strbuf_addstr(buf, name_hint);
	-	if (buf->len > TR2_MAX_THREAD_NAME)
	-		strbuf_setlen(buf, TR2_MAX_THREAD_NAME);
	-
	 }

Okey, so as explained in the commit message we no longer need to worry
about this limit, but I think leading with a change to just change that
first would help. I.e. wouldn't starting with keeping the strbuf and
doing this truncation in tr2_tgt_perf.c give you the functiotnal change
first, without the type change?

Doing it this way means we're changing the type, and also removing the
limit on thread names for non-perf backends.
	 
	 struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
	 					     uint64_t us_thread_start)
	 {
	 	struct tr2tls_thread_ctx *ctx;
	+	struct strbuf buf_name = STRBUF_INIT;

Okey, now our scratch buffer is function local, but:

	 	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
	 
	-	ctx = xcalloc(1, sizeof(*ctx));
	+	fill_thread_name(&buf_name, name_hint, thread_id);

We still need to malloc() that "struct strbuf", this is the main thing I
found confusing and why I didn't see the point in the original
series. I.e. we can normally pull compiler tricks with FLEX_ARRAY to
avoid allocations.

But here you need to format this string anyway, so we've already
malloc'd it, you just....

	+
	+	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);

...memcpy() it to the FLEX_ARRAY here, but then...

	+	strbuf_release(&buf_name);

...we have to release this thing we malloc()'d, which was previously the
pointer in the struct. 

	+
	+	ctx->thread_id = thread_id;
	 
	 	/*
	 	 * Implicitly "tr2tls_push_self()" to capture the thread's start

So, I don't really see the point of this "flex array for implicit const"
per the above, you noted in [1] "I convert the field to a flex-array to
avoid [...] the allocation" but what allocation are we really avoiding
here?

We still have to allocate the strbuf as before, we just now allocate the
struct as before + the length of that strbuf, then we can free the
strbuf.

Is it that the end memory use is lower because while we have a
allocation for the strbuf we release it right away, and the compiler (on
some platforms?) can play tricks with sticking this into padding it was
going to put there anyway, given the length of the string?

I can think of ways this *might* matter, I'm just mainly saying that
you're leaving the reader guessing still.

Aside: I can imagine that we *could* actually avoid an allocation here
by being more sneaky.

I.e. you could FLEX_ALLOC_MEM() before populating the strbuf, as we know
the format is "th%02d:", so the space we need for the string is:

	strlen(name_hint) + strlen("th") + strlen(4) /* %02d: + \0 */;

Then you could memset(&ctx->thread_name, 0, len) and strbuf_attach() the
pointer to the start of that, and voila, you could use strbuf_addf() to
do the %02d format part of that.

But I still don't see how this is an area that justifies that sort of
micro-optimization (or worrying about strbuf v.s. flex array),
i.e. don't we usually just have max ncpu threads anyway (the format
implies max 99), so a few strings like "th01:main" aren't going to cost
us much, are they?

<tries it out>

Anyway, if this area was actually performance critical and we *really
cared* about avoiding allocations wouldn't we want to skip both the
"strbuf" there and the "FLEX_ARRAY", and just save away the
"thread_hint" (which the caller hardcodes) and "thread_nr", and then
append on-the-fly?

I came up with the below to do that, it passes all tests, but contains
micro-optimizations that I don't think we need (e.g. I understood you
wanted to avoid printf, so it does that).

But I think it's a useful point of discussion. What test(s) do you have
where the "master" version, FLEX_ARRAY version, and just not strbuf
formatting the thing at all differ?
	
	diff --git a/json-writer.c b/json-writer.c
	index f1cfd8fa8c6..124ad72d200 100644
	--- a/json-writer.c
	+++ b/json-writer.c
	@@ -161,6 +161,47 @@ void jw_object_string(struct json_writer *jw, const char *key, const char *value
	 	append_quoted_string(&jw->json, value);
	 }
	 
	+void jw_strbuf_add_thread_name(struct strbuf *out, const char *thread_hint,
	+			       int thread_id, int max_len)
	+{
	+	size_t oldlen = out->len;
	+
	+	if (thread_id) {
	+		strbuf_addf(out, "th");
	+		/*
	+		 * We're avoiding printf here when on-the-fly
	+		 * formatting, but why?
	+		 */
	+		if (thread_id < 10) {
	+			strbuf_addch(out, '0');
	+			strbuf_addch(out, thread_id % 10 + '0');
	+		} else {
	+			strbuf_addch(out, thread_id / 10 + '0');
	+			strbuf_addch(out, thread_id % 10 + '0');
	+		}
	+		strbuf_addch(out, ':');
	+	}
	+	if (max_len) {
	+		int added = out->len - oldlen;
	+		int limit = max_len - added;
	+
	+		strbuf_addf(out, "%.*s", limit, thread_hint);
	+	} else {
	+		strbuf_addstr(out, thread_hint);
	+	}
	+}
	+
	+void jw_object_thread(struct json_writer *jw, const char *thread_hint,
	+		      int thread_id)
	+{
	+	struct strbuf *out = &jw->json;
	+
	+	object_common(jw, "thread");
	+	strbuf_addch(out, '"');
	+	jw_strbuf_add_thread_name(out, thread_hint, thread_id, 0);
	+	strbuf_addch(out, '"');
	+}
	+
	 void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value)
	 {
	 	object_common(jw, key);
	diff --git a/json-writer.h b/json-writer.h
	index 209355e0f12..51b78296f8a 100644
	--- a/json-writer.h
	+++ b/json-writer.h
	@@ -77,6 +77,10 @@ void jw_array_begin(struct json_writer *jw, int pretty);
	 
	 void jw_object_string(struct json_writer *jw, const char *key,
	 		      const char *value);
	+void jw_strbuf_add_thread_name(struct strbuf *out, const char *thread_hint,
	+			       int thread_id, int max_len);
	+void jw_object_thread(struct json_writer *jw, const char *thread_hint,
	+		      const int thread_id);
	 void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value);
	 void jw_object_double(struct json_writer *jw, const char *key, int precision,
	 		      double value);
	diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
	index bb0653e0e6f..6e480fce34a 100644
	--- a/trace2/tr2_tgt_event.c
	+++ b/trace2/tr2_tgt_event.c
	@@ -91,7 +91,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
	 
	 	jw_object_string(jw, "event", event_name);
	 	jw_object_string(jw, "sid", tr2_sid_get());
	-	jw_object_string(jw, "thread", ctx->thread_name);
	+	jw_object_thread(jw, ctx->thread_hint, ctx->thread_id);
	 
	 	/*
	 	 * In brief mode, only emit <time> on these 2 event types.
	diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
	index cbf8aefd56c..9f310756349 100644
	--- a/trace2/tr2_tgt_perf.c
	+++ b/trace2/tr2_tgt_perf.c
	@@ -71,6 +71,8 @@ static void perf_fmt_prepare(const char *event_name,
	 			     const char *category, struct strbuf *buf)
	 {
	 	int len;
	+	int oldlen;
	+	int thread_pad;
	 
	 	strbuf_setlen(buf, 0);
	 
	@@ -109,11 +111,11 @@ static void perf_fmt_prepare(const char *event_name,
	 	}
	 
	 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
	-	strbuf_addf(buf, "%-*.*s | %-*s | ",
	-		    TR2FMT_PERF_MAX_THREAD_NAME,
	-		    TR2FMT_PERF_MAX_THREAD_NAME,
	-		    ctx->thread_name,
	-		    TR2FMT_PERF_MAX_EVENT_NAME,
	+	oldlen = buf->len;
	+	jw_strbuf_add_thread_name(buf, ctx->thread_hint, ctx->thread_id,
	+				  TR2FMT_PERF_MAX_THREAD_NAME);
	+	thread_pad = TR2FMT_PERF_MAX_THREAD_NAME - (buf->len - oldlen);
	+	strbuf_addf(buf, "%-*s | %-*s | ", thread_pad, "", TR2FMT_PERF_MAX_EVENT_NAME,
	 		    event_name);
	 
	 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
	diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
	index 02117f808eb..9959ec9b160 100644
	--- a/trace2/tr2_tls.c
	+++ b/trace2/tr2_tls.c
	@@ -31,26 +31,13 @@ void tr2tls_start_process_clock(void)
	 	tr2tls_us_start_process = getnanotime() / 1000;
	 }
	 
	-static void fill_thread_name(struct strbuf *buf, const char *name_hint,
	-			     int thread_id)
	-{
	-	if (thread_id)
	-		strbuf_addf(buf, "th%02d:", thread_id);
	-	strbuf_addstr(buf, name_hint);
	-}
	-
	 struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
	 					     uint64_t us_thread_start)
	 {
	-	struct tr2tls_thread_ctx *ctx;
	-	struct strbuf buf_name = STRBUF_INIT;
	+	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
	 	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
	 
	-	fill_thread_name(&buf_name, name_hint, thread_id);
	-
	-	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
	-	strbuf_release(&buf_name);
	-
	+	ctx->thread_hint = name_hint;
	 	ctx->thread_id = thread_id;
	 
	 	/*
	@@ -120,7 +107,8 @@ void tr2tls_pop_self(void)
	 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
	 
	 	if (!ctx->nr_open_regions)
	-		BUG("no open regions in thread '%s'", ctx->thread_name);
	+		BUG("no open regions in thread '%s' '%d'", ctx->thread_hint,
	+		    ctx->thread_id);
	 
	 	ctx->nr_open_regions--;
	 }
	diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
	index e306c9bf3ec..f873615ebef 100644
	--- a/trace2/tr2_tls.h
	+++ b/trace2/tr2_tls.h
	@@ -21,8 +21,8 @@ struct tr2tls_thread_ctx {
	 	unsigned int used_any_timer:1;
	 	unsigned int used_any_per_thread_timer:1;
	 	unsigned int used_any_counter:1;
	+	const char *thread_hint;
	 	unsigned int used_any_per_thread_counter:1;
	-	char thread_name[FLEX_ARRAY];
	 };
	 
	 /*
	



1. https://lore.kernel.org/git/e3fd64ef-9e26-19da-7327-38ab77ae359a@jeffhostetler.com/



> @@ -95,7 +97,6 @@ void tr2tls_unset_self(void)
>  
>  	pthread_setspecific(tr2tls_key, NULL);
>  
> -	strbuf_release(&ctx->thread_name);
>  	free(ctx->array_us_start);
>  	free(ctx);
>  }
> @@ -113,7 +114,7 @@ void tr2tls_pop_self(void)
>  	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
>  
>  	if (!ctx->nr_open_regions)
> -		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
> +		BUG("no open regions in thread '%s'", ctx->thread_name);
>  
>  	ctx->nr_open_regions--;
>  }
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index f1ee58305d6..be0bc73d08f 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -9,17 +9,12 @@
>   * There is NO relation to "transport layer security".
>   */
>  
> -/*
> - * Arbitry limit for thread names for column alignment.
> - */
> -#define TR2_MAX_THREAD_NAME (24)
> -
>  struct tr2tls_thread_ctx {
> -	struct strbuf thread_name;
>  	uint64_t *array_us_start;
>  	size_t alloc;
>  	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
>  	int thread_id;
> +	char thread_name[FLEX_ARRAY];
>  };
>  
>  /*
> @@ -32,8 +27,6 @@ struct tr2tls_thread_ctx {
>   * upon the name of the thread-proc function).  For example:
>   *     { .thread_id=10, .thread_name="th10fsm-listen" }
>   * This helps to identify and distinguish messages from concurrent threads.
> - * The ctx.thread_name field is truncated if necessary to help with column
> - * alignment in printf-style messages.
>   *
>   * In this and all following functions the term "self" refers to the
>   * current thread.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 0/9] Trace2 timers and counters and some cleanup
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (8 preceding siblings ...)
  2022-10-04 16:20 ` [PATCH 9/9] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
@ 2022-10-05 13:04 ` Ævar Arnfjörð Bjarmason
  2022-10-06 15:45   ` Jeff Hostetler
  2022-10-06 18:12 ` Derrick Stolee
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
  11 siblings, 1 reply; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-05 13:04 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler


On Tue, Oct 04 2022, Jeff Hostetler via GitGitGadget wrote:

> This patch series add stopwatch timers and global counters to the trace2
> logging facility. It also does a little housecleaning.
>
> This is basically a rewrite of the series that I submitted back in December
> 2021: [1] and [2]. Hopefully, it addresses all of the concerns raised back
> then and does it in a way that avoids the issues that stalled that effort.
>
> First we start with a few housecleaning commits:
>
>  * The first 2 commits are unrelated to this effort, but were required to
>    get the existing code to compile on my Mac with Clang 11.0.0 with
>    DEVELOPER=1. Those can be dropped if there is a better way to do this.

This seems like a good thing to have, but there's no subsequent changes
to those two files on this topic, so is this just a "to get it building
on my laptop..." stashed-on?

I think if so it makes sense to split these up, and as feeback on 1-2/9:
Let's note what compiler/version & what warning we got, the details
there for anyone to dig this up later are missing, i.e. if we ever want
to remove the workaround syntax.

>  * The 3rd commit is in response a concern about using int rather than
>    size_t for nr and alloc in an ALLOC_GROW() in existing trace2 code.

This small bit of cleanup also could perhaps be submitted separately?
It's unclear (and I read the concern in the initial thread) if this is
required by anything that follows.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-04 16:20 ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler via GitGitGadget
  2022-10-05 11:14   ` Ævar Arnfjörð Bjarmason
@ 2022-10-05 18:03   ` Junio C Hamano
  2022-10-06 21:05     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2022-10-05 18:03 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
> to a "flex array" at the end of the context structure.
>
> The `thread_name` field is a constant string that is constructed when
> the context is created.  Using a (non-const) `strbuf` structure for it
> caused some confusion in the past because it implied that someone
> could rename a thread after it was created.  That usage was not
> intended.  Changing it to a "flex array" will hopefully make the
> intent more clear.

Surely, "const struct strbuf name;" member would be an oxymoron, and
I agree that this should follow "use strbuf as an easy-to-work-with
mechanism to come up with a string, and bake the final value into a
struct as a member of type 'const char []'" pattern.

I recall saying why I thought the flex array was overkill, though.

You have been storing an up-to-24-byte human readable name by
embedding a strbuf that has two size_t plus a pointer (i.e. 24-bytes
even on Windows), and as TR2_MAX_THREAD_NAME is capped at 24 bytes
anyway, an embedded fixed-size thread_name[TR2_MAX_THREAD_NAME+1]
member may be the simplest thing to do, I suspect.

If we were to allow arbitrarily long thread_name[], which may not be
a bad thing to do (e.g. we do not have to worry about truncation
making two names ambiguous, for example), then the flex array is the
right direction to go in, though.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 0/9] Trace2 timers and counters and some cleanup
  2022-10-05 13:04 ` [PATCH 0/9] Trace2 timers and counters and some cleanup Ævar Arnfjörð Bjarmason
@ 2022-10-06 15:45   ` Jeff Hostetler
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-06 15:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 10/5/22 9:04 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Oct 04 2022, Jeff Hostetler via GitGitGadget wrote:
> 
>> This patch series add stopwatch timers and global counters to the trace2
>> logging facility. It also does a little housecleaning.
>>
>> This is basically a rewrite of the series that I submitted back in December
>> 2021: [1] and [2]. Hopefully, it addresses all of the concerns raised back
>> then and does it in a way that avoids the issues that stalled that effort.
>>
>> First we start with a few housecleaning commits:
>>
>>   * The first 2 commits are unrelated to this effort, but were required to
>>     get the existing code to compile on my Mac with Clang 11.0.0 with
>>     DEVELOPER=1. Those can be dropped if there is a better way to do this.
> 
> This seems like a good thing to have, but there's no subsequent changes
> to those two files on this topic, so is this just a "to get it building
> on my laptop..." stashed-on?

Right. I needed them to get "main" to build on my laptop before I
started hacking.  I debated sending them in separately, but everyone
was busy with the 2.38 release and didn't want to add to the noise for
such a minor thing, since all the CI builds were green...

But, yeah, I can do that.

> 
> I think if so it makes sense to split these up, and as feeback on 1-2/9:
> Let's note what compiler/version & what warning we got, the details
> there for anyone to dig this up later are missing, i.e. if we ever want
> to remove the workaround syntax.
> 
>>   * The 3rd commit is in response a concern about using int rather than
>>     size_t for nr and alloc in an ALLOC_GROW() in existing trace2 code.
> 
> This small bit of cleanup also could perhaps be submitted separately?
> It's unclear (and I read the concern in the initial thread) if this is
> required by anything that follows.
> 

Nothing requires this. It was just another "while I'm here" fixup.
However, those lines are very close to new/changed lines that I added
for the timers and counters, so it would probably cause collisions if
sent independently.  So I'd like to leave them in this series to
simplify things.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-05 11:14   ` Ævar Arnfjörð Bjarmason
@ 2022-10-06 16:28     ` Jeff Hostetler
  2022-10-10 18:31     ` Jeff Hostetler
  1 sibling, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-06 16:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 10/5/22 7:14 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Oct 04 2022, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
>> to a "flex array" at the end of the context structure.
>>
>> The `thread_name` field is a constant string that is constructed when
>> the context is created.  Using a (non-const) `strbuf` structure for it
>> caused some confusion in the past because it implied that someone
>> could rename a thread after it was created.
> 
> I think it's been long enough that we could use a reminder about the
> "some confusion", i.e. if it was a bug report or something else.
> 
>> That usage was not intended.  Changing it to a "flex array" will
>> hopefully make the intent more clear.
> 
> I see we had some back & forth back in the original submission, although
> honestly I skimmed this this time around, had forgetten about that, and
> had this pop out at me, and then found my earlier comments.
> 
> I see that exchange didn't end as well as I'd hoped[1], and hopefully we
> can avoid that here. So having looked at this with fresh eyes maybe
> these comments/questions help:

Yeah, those conversations went rather poorly.  And yes, I'd like to
avoid all of that.  There's a lot in your note here and it'll take a
little while to digest and respond.  But I did want to ACK, sooner
rather than later, that we agree on that.

And yes, I could split out the truncation into a separate commit.
And then revisit the storage change.

Thanks
Jeff


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 0/9] Trace2 timers and counters and some cleanup
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (9 preceding siblings ...)
  2022-10-05 13:04 ` [PATCH 0/9] Trace2 timers and counters and some cleanup Ævar Arnfjörð Bjarmason
@ 2022-10-06 18:12 ` Derrick Stolee
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
  11 siblings, 0 replies; 73+ messages in thread
From: Derrick Stolee @ 2022-10-06 18:12 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget, git; +Cc: Jeff Hostetler

On 10/4/22 12:19 PM, Jeff Hostetler via GitGitGadget wrote:
> This patch series add stopwatch timers and global counters to the trace2
> logging facility. It also does a little housecleaning.
> 
> This is basically a rewrite of the series that I submitted back in December
> 2021: [1] and [2]. Hopefully, it addresses all of the concerns raised back
> then and does it in a way that avoids the issues that stalled that effort.

Thanks for working on this again. As I mentioned earlier [3], this
would be really helpful when doing performance investigations. I
also plan to insert some timers and counters as a follow-up when
this series stabilizes.

[3] https://lore.kernel.org/git/pull.1365.git.1663938034607.gitgitgadget@gmail.com/

I was unable to find further improvements than the ones you
already acknowledged for your v2.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-05 18:03   ` Junio C Hamano
@ 2022-10-06 21:05     ` Ævar Arnfjörð Bjarmason
  2022-10-06 21:50       ` Junio C Hamano
  2022-10-10 18:39       ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler
  0 siblings, 2 replies; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-06 21:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler


On Wed, Oct 05 2022, Junio C Hamano wrote:

> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
>> to a "flex array" at the end of the context structure.
>>
>> The `thread_name` field is a constant string that is constructed when
>> the context is created.  Using a (non-const) `strbuf` structure for it
>> caused some confusion in the past because it implied that someone
>> could rename a thread after it was created.  That usage was not
>> intended.  Changing it to a "flex array" will hopefully make the
>> intent more clear.
>
> Surely, "const struct strbuf name;" member would be an oxymoron, and
> I agree that this should follow "use strbuf as an easy-to-work-with
> mechanism to come up with a string, and bake the final value into a
> struct as a member of type 'const char []'" pattern.
>
> I recall saying why I thought the flex array was overkill, though.
>
> You have been storing an up-to-24-byte human readable name by
> embedding a strbuf that has two size_t plus a pointer (i.e. 24-bytes
> even on Windows), and as TR2_MAX_THREAD_NAME is capped at 24 bytes
> anyway, an embedded fixed-size thread_name[TR2_MAX_THREAD_NAME+1]
> member may be the simplest thing to do, I suspect.
>
> If we were to allow arbitrarily long thread_name[], which may not be
> a bad thing to do (e.g. we do not have to worry about truncation
> making two names ambiguous, for example), then the flex array is the
> right direction to go in, though.

We don't even need that, AFAICT. My reply at [1] is rather long, but the
tl;dr is that the interface for this API is:
	
	$ git grep '^\s+trace2_thread_start'
	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");
	trace2.h:       trace2_thread_start_fl((thread_hint), __FILE__, __LINE__)

And we are taking e.g. "preload_thread" and turning it into strings like
these, and saving it into "struct tr2tls_thread_ctx".

	"preload_thread", // main thread
	"th01:preload_thread", // 1st thread
	"th02:preload_thread" // 2nd thread
	[...]

So, we don't need to strdup() and store that "preload_thread" anywhere.
It's already a constant string we have hardcoded in the program. We just
need to save a pointer to it.

Then we just format the "%s" or (if ".thread_id" == 0) or "th%02d:%s"
(if ".thread_id" > 0) on-the-fly, the two codepaths that end up using
this are already using strbuf_addf(), so just adding to the format there
is easy.

1. https://lore.kernel.org/git/221005.86y1tus9ps.gmgdl@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-06 21:05     ` Ævar Arnfjörð Bjarmason
@ 2022-10-06 21:50       ` Junio C Hamano
  2022-10-07  1:10         ` [RFC PATCH] trace2 API: don't save a copy of constant "thread_name" Ævar Arnfjörð Bjarmason
  2022-10-10 18:39       ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler
  1 sibling, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2022-10-06 21:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> If we were to allow arbitrarily long thread_name[], which may not be
>> a bad thing to do (e.g. we do not have to worry about truncation
>> making two names ambiguous, for example), then the flex array is the
>> right direction to go in, though.
>
> We don't even need that, AFAICT. ...
> ...
> And we are taking e.g. "preload_thread" and turning it into strings like
> these, and saving it into "struct tr2tls_thread_ctx".
>
> 	"preload_thread", // main thread
> 	"th01:preload_thread", // 1st thread
> 	"th02:preload_thread" // 2nd thread
> 	[...]
>
> So, we don't need to strdup() and store that "preload_thread" anywhere.
> It's already a constant string we have hardcoded in the program. We just
> need to save a pointer to it.

That sounds even simpler.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-06 21:50       ` Junio C Hamano
@ 2022-10-07  1:10         ` Ævar Arnfjörð Bjarmason
  2022-10-07  1:16           ` Junio C Hamano
  2022-10-10 19:05           ` Jeff Hostetler
  0 siblings, 2 replies; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-07  1:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jeff Hostetler via GitGitGadget,
	Ævar Arnfjörð Bjarmason

Since ee4512ed481 (trace2: create new combined trace facility,
2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has
been copied from the caller, but those callers have always passed a
constant string:

	$ git -P grep '^\s*trace2_thread_start\('
	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");

This isn't needed for optimization, but apparently[1] there's been
some confusion about the non-const-ness of the previous "struct
strbuf".

Using the caller's string here makes this more straightforward, as
it's now clear that we're not dynamically constructing these. It's
also what the progress API does with its "title" string.

Since we know we're hardcoding these thread names let's BUG() out when
we see that the length of the name plus the length of the prefix would
exceed the maximum length for the "perf" format.

1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

On Thu, Oct 06 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> So, we don't need to strdup() and store that "preload_thread" anywhere.
>> It's already a constant string we have hardcoded in the program. We just
>> need to save a pointer to it.
>
> That sounds even simpler.

A cleaned up version of the test code I had on top of "master", RFC
because I may still be missing some context here. E.g. maybe there's a
plan to dynamically construct these thread names?

 json-writer.c          | 17 +++++++++++++++++
 json-writer.h          |  4 ++++
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  | 10 +++++++---
 trace2/tr2_tls.c       | 14 +++++---------
 trace2/tr2_tls.h       |  9 +++++++--
 6 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/json-writer.c b/json-writer.c
index f1cfd8fa8c6..569a75bee51 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -161,6 +161,23 @@ void jw_object_string(struct json_writer *jw, const char *key, const char *value
 	append_quoted_string(&jw->json, value);
 }
 
+void jw_strbuf_add_thread_name(struct strbuf *sb, const char *thread_name,
+			       int thread_id)
+{
+	if (thread_id)
+		strbuf_addf(sb, "th%02d:", thread_id);
+	strbuf_addstr(sb, thread_name);
+}
+
+void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
+			     int thread_id)
+{
+	object_common(jw, "thread");
+	strbuf_addch(&jw->json, '"');
+	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
+	strbuf_addch(&jw->json, '"');
+}
+
 void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value)
 {
 	object_common(jw, key);
diff --git a/json-writer.h b/json-writer.h
index 209355e0f12..269c203b119 100644
--- a/json-writer.h
+++ b/json-writer.h
@@ -75,6 +75,10 @@ void jw_release(struct json_writer *jw);
 void jw_object_begin(struct json_writer *jw, int pretty);
 void jw_array_begin(struct json_writer *jw, int pretty);
 
+void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name,
+			       int thread_id);
+void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
+			     int thread_id);
 void jw_object_string(struct json_writer *jw, const char *key,
 		      const char *value);
 void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value);
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 37a3163be12..1308cf05df4 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -90,7 +90,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string_thread(jw, ctx->thread_name, ctx->thread_id);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 8cb792488c8..ab21277eb36 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -69,6 +69,8 @@ static void perf_fmt_prepare(const char *event_name,
 			     const char *category, struct strbuf *buf)
 {
 	int len;
+	size_t oldlen;
+	int padlen;
 
 	strbuf_setlen(buf, 0);
 
@@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
 	}
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
-	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
-		    event_name);
+	oldlen = buf->len;
+	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
+	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
+	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
+		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
 	if (repo)
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..aa9aeb67fca 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -36,6 +36,9 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
 
+	if (strlen(thread_name) + TR2_MAX_THREAD_NAME_PREFIX > TR2_MAX_THREAD_NAME)
+		BUG("too long thread name '%s'", thread_name);
+
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
 	 * time in array_us_start[0].  For the main thread this gives us the
@@ -45,15 +48,9 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
 
+	ctx->thread_name = thread_name;
 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
-	strbuf_init(&ctx->thread_name, 0);
-	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
-
 	pthread_setspecific(tr2tls_key, ctx);
 
 	return ctx;
@@ -95,7 +92,6 @@ void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +109,7 @@ void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..f600eb22551 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -4,12 +4,17 @@
 #include "strbuf.h"
 
 /*
- * Arbitry limit for thread names for column alignment.
+ * Arbitry limit for thread names for column alignment. The overall
+ * max length is TR2_MAX_THREAD_NAME, and the
+ * TR2_MAX_THREAD_NAME_PREFIX is the length of the formatted
+ * '"th%02d:", ctx->thread_id' prefix which is added when "thread_id >
+ * 0".
  */
+#define TR2_MAX_THREAD_NAME_PREFIX (5)
 #define TR2_MAX_THREAD_NAME (24)
 
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
+	const char *thread_name;
 	uint64_t *array_us_start;
 	int alloc;
 	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
-- 
2.38.0.971.ge79ff6d20e7


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-07  1:10         ` [RFC PATCH] trace2 API: don't save a copy of constant "thread_name" Ævar Arnfjörð Bjarmason
@ 2022-10-07  1:16           ` Junio C Hamano
  2022-10-07 10:03             ` Ævar Arnfjörð Bjarmason
  2022-10-10 19:05           ` Jeff Hostetler
  1 sibling, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2022-10-07  1:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff Hostetler, Jeff Hostetler via GitGitGadget

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> A cleaned up version of the test code I had on top of "master", RFC
> because I may still be missing some context here. E.g. maybe there's a
> plan to dynamically construct these thread names?

That's nice to learn, indeed.

> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
> +			     int thread_id)
> +{
> +	object_common(jw, "thread");
> +	strbuf_addch(&jw->json, '"');
> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
> +	strbuf_addch(&jw->json, '"');
> +}

...

> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>  	}
>  
>  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
> -		    event_name);
> +	oldlen = buf->len;
> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);

Having to do strbuf_addf() many times may negatively affect perf_*
stuff, if this code is invoked in the hot path.  I however tend to
treat anything that involves an I/O not performance critical, and
this certainly falls into that category.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-07  1:16           ` Junio C Hamano
@ 2022-10-07 10:03             ` Ævar Arnfjörð Bjarmason
  2022-10-10 19:16               ` Jeff Hostetler
  0 siblings, 1 reply; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-07 10:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff Hostetler, Jeff Hostetler via GitGitGadget


On Thu, Oct 06 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> A cleaned up version of the test code I had on top of "master", RFC
>> because I may still be missing some context here. E.g. maybe there's a
>> plan to dynamically construct these thread names?
>
> That's nice to learn, indeed.
>
>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>> +			     int thread_id)
>> +{
>> +	object_common(jw, "thread");
>> +	strbuf_addch(&jw->json, '"');
>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>> +	strbuf_addch(&jw->json, '"');
>> +}
>
> ...
>
>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>  	}
>>  
>>  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>> -		    event_name);
>> +	oldlen = buf->len;
>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>
> Having to do strbuf_addf() many times may negatively affect perf_*
> stuff, if this code is invoked in the hot path.  I however tend to
> treat anything that involves an I/O not performance critical, and
> this certainly falls into that category.

Yes, and that function already called strbuf_addf() 5-7 times, this adds
one more, but only if "thread_id" is > 0.

The reason I added jw_object_string_thread() was to avoid the malloc() &
free() of a temporary "struct strbuf", it would have been more
straightforward to call jw_object_string() like that.

I don't think anyone cares about the raw performance of the "perf"
output, but the "JSON" one needs to be fast(er).

But even that output will malloc()/free() for each line it emits, and
often multiple times within one line (e.g. each time we format a
double).

So if we do want to optimize this in terms of memory use the lowest
hanging fruit seems to be to just have a per-thread "scratch" buffer
we'd write to, we could also observe that we're writing to a file and
just directly write to it in most cases (although we'd need to be
careful to write partial-and-still-invalid JSON lines in that case...).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-05 11:14   ` Ævar Arnfjörð Bjarmason
  2022-10-06 16:28     ` Jeff Hostetler
@ 2022-10-10 18:31     ` Jeff Hostetler
  1 sibling, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-10 18:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 10/5/22 7:14 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Oct 04 2022, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
>> to a "flex array" at the end of the context structure.
>>
>> The `thread_name` field is a constant string that is constructed when
>> the context is created.  Using a (non-const) `strbuf` structure for it
>> caused some confusion in the past because it implied that someone
>> could rename a thread after it was created.
> 
> I think it's been long enough that we could use a reminder about the
> "some confusion", i.e. if it was a bug report or something else.
> 
>> That usage was not intended.  Changing it to a "flex array" will
>> hopefully make the intent more clear.
> 
> I see we had some back & forth back in the original submission, although
> honestly I skimmed this this time around, had forgetten about that, and
> had this pop out at me, and then found my earlier comments.
> 
> I see that exchange didn't end as well as I'd hoped[1], and hopefully we
> can avoid that here. So having looked at this with fresh eyes maybe
> these comments/questions help:
> 
>   * I'm unable to bridge the cap from (paraphrased) "we must change the
>     type" to "mak[ing] the [read-only] intent more clear".
> 
>     I.e. if you go across the codebase and look at various non-const
>     "char name[FLEX_ARRAY]" and add a "const" to them you'll find cases
>     where we re-write the "FLEX_ARRAY" string, e.g. the one in archive.c
>     is one of those (the first grep hit, I stopped looking for others at
>     that point).
> 
>     Making it "const" will yield:
>     
>        archive.c: In function ‘queue_directory’:
>     archive.c:206:29: error: passing argument 1 of ‘xsnprintf’ discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
>       206 |         d->len = xsnprintf(d->path, len, "%.*s%s/", (int)base->len, base->buf, filename);
> 
>     So aside from anything else (and I may be misunderstanding this) why
>     does changing it to a FLEX_ARRAY give us the connotation in the
>     confused API user's mind that it shouldn't be messed with that the
>     "strbuf" doesn't give us?
[...]

My change in how we store the thread-name in the thread context was JUST
to clarify that it should be treated as a constant string and that code
should not try to modify it.  There was a comment to that effect last
year -- that having it be a strbuf invited one to modify it, when that
was not the intent.

That was all I was trying to do here.  Just make it "not be a strbuf".
Perhaps I lept too far by making it a flex-array.  I probably could
have just changed the field to a "char *" and detached it from the
(now local) strbuf.  That would give the same impression, right?


[...]
>>   	/*
>>   	 * Implicitly "tr2tls_push_self()" to capture the thread's start
>> @@ -45,15 +56,6 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
>>   	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
>>   	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
>>   
>> -	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>> -
>> -	strbuf_init(&ctx->thread_name, 0);
>> -	if (ctx->thread_id)
>> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
>> -	strbuf_addstr(&ctx->thread_name, name_hint);
>> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
>> -
>>   	pthread_setspecific(tr2tls_key, ctx);
>>   
>>   	return ctx;
> 
> I found this quote hard to follow because there's functional changes
> there mixed up with code re-arangement, consider leading with a commit
> like:
[...]

sorry about that.  yes, there's a bit of churn here because i
needed to reorder the thread-name construction to be before we
allocated the context so that we'd know the buffer size.

and yes, i accidentally mixed in a function change to move the
truncation to the perf backend.

i'll redo all of this.


[...]
> <tries it out>
> 
> Anyway, if this area was actually performance critical and we *really
> cared* about avoiding allocations wouldn't we want to skip both the
> "strbuf" there and the "FLEX_ARRAY", and just save away the
> "thread_hint" (which the caller hardcodes) and "thread_nr", and then
> append on-the-fly?
> 
> I came up with the below to do that, it passes all tests, but contains
> micro-optimizations that I don't think we need (e.g. I understood you
> wanted to avoid printf, so it does that).
> 
> But I think it's a useful point of discussion. What test(s) do you have
> where the "master" version, FLEX_ARRAY version, and just not strbuf
> formatting the thing at all differ?
[...]

none of this was about micro-optimization.  i was just trying to get
the buffer away from a strbuf.  i still want it pre-formatted once
at thread-start, but that's it.

FWIW, I don't think having it formatted in each event helps anything.
it would have to go thru sprintf on every message.  it's much better
to just format it once in the thread-start.


[...]
> 	diff --git a/json-writer.c b/json-writer.c
[...] 	
> 	+void jw_strbuf_add_thread_name(struct strbuf *out, const char *thread_hint,
> 	+			       int thread_id, int max_len)
> 	+{
[...]
> 	+}
> 	+
> 	+void jw_object_thread(struct json_writer *jw, const char *thread_hint,
> 	+		      int thread_id)
> 	+{
[...]
> 	+}
[...]

We should not do this.  Just format the name in thread-start and
let json-writer print the string as we have been.

Adding thread formatting to json-writer also violates a separation
of concerns.

I'll re-roll this commit completely.

thanks
Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/9] trace2: convert ctx.thread_name to flex array
  2022-10-06 21:05     ` Ævar Arnfjörð Bjarmason
  2022-10-06 21:50       ` Junio C Hamano
@ 2022-10-10 18:39       ` Jeff Hostetler
  1 sibling, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-10 18:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano
  Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler



On 10/6/22 5:05 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 05 2022, Junio C Hamano wrote:
> 
>> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
>>> to a "flex array" at the end of the context structure.
>>>
[...]
> 
> We don't even need that, AFAICT. My reply at [1] is rather long, but the
> tl;dr is that the interface for this API is:
> 	
> 	$ git grep '^\s+trace2_thread_start'
> 	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
> 	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
> 	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
> 	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");
> 	trace2.h:       trace2_thread_start_fl((thread_hint), __FILE__, __LINE__)
> 
> And we are taking e.g. "preload_thread" and turning it into strings like
> these, and saving it into "struct tr2tls_thread_ctx".
> 
> 	"preload_thread", // main thread
> 	"th01:preload_thread", // 1st thread
> 	"th02:preload_thread" // 2nd thread
> 	[...]
> 
> So, we don't need to strdup() and store that "preload_thread" anywhere.
> It's already a constant string we have hardcoded in the program. We just
> need to save a pointer to it.

Current callers tend to pass a string literal.  There's nothing
to say that they will continue to do so in the future.


> Then we just format the "%s" or (if ".thread_id" == 0) or "th%02d:%s"
> (if ".thread_id" > 0) on-the-fly, the two codepaths that end up using
> this are already using strbuf_addf(), so just adding to the format there
> is easy.
[...]

But then you'd be formatting this "th%0d:%s" on every message
printer.  Whereas we can format it once in the thread-start and
save the extra work -- at the expense of a string buffer in the
thread context.

Granted, the event handlers are generating output lines with many
"%" fields, so they are doing non-trivial amounts of work on every
event, but by using a pre-formatted thread-name, we don't need to
increase that workload.

Jeff



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-07  1:10         ` [RFC PATCH] trace2 API: don't save a copy of constant "thread_name" Ævar Arnfjörð Bjarmason
  2022-10-07  1:16           ` Junio C Hamano
@ 2022-10-10 19:05           ` Jeff Hostetler
  2022-10-11 12:52             ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-10 19:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff Hostetler, Jeff Hostetler via GitGitGadget



On 10/6/22 9:10 PM, Ævar Arnfjörð Bjarmason wrote:
> Since ee4512ed481 (trace2: create new combined trace facility,
> 2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has
> been copied from the caller, but those callers have always passed a
> constant string:
> 
> 	$ git -P grep '^\s*trace2_thread_start\('
> 	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
> 	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
> 	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
> 	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");
> 
> This isn't needed for optimization, but apparently[1] there's been
> some confusion about the non-const-ness of the previous "struct
> strbuf".
> 
> Using the caller's string here makes this more straightforward, as
> it's now clear that we're not dynamically constructing these. It's
> also what the progress API does with its "title" string.
> 
> Since we know we're hardcoding these thread names let's BUG() out when
> we see that the length of the name plus the length of the prefix would
> exceed the maximum length for the "perf" format.
> 
> 1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

PLEASE DON'T DO THIS.

If you don't like my patch, fine.  Let's discuss it.  But DON'T submit
a new one to replace it.  Or worse, try to inject it into the middle
of an existing series.


Yes, current callers are passing a string literal and thread-start
could take a "const char*" to it, but there is no way to guarantee
that that is safe if someone decides to dynamically construct their
thread-name and pass it in (since we don't know the lifetime of that
pointer).  So it is safer to copy it into the thread context so that
it can be used by later trace messages.


[...]
> +void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name,
> +			       int thread_id);
> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
> +			     int thread_id);

This violates a separation of concerns.  json-writer is ONLY concerned
with formatting valid JSON from basic data types.  It does not know
about threads or thread contexts.

`js_strbuf_add_thread_name()` also violates the json-writer conventions
-- that it takes a "struct json_writer *" pointer.  There is nothing
about JSON here.

You might write a helper (inside of tr2_tgt_event.c) that formats a
thread-name from the id and hint, but that is specific to the Event
target -- not to JSON, nor the JSON writer.

But then again, why make every trace message from every target format
that "th%0d:%s" when we could save some time and format it in the
thread-start and just USE it.


[...]
> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>   	}
>   
>   	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
> -		    event_name);
> +	oldlen = buf->len;
> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);

This stands out as very wrong.  The _Perf target does not use JSON
at all, yet here we are calling a jw_ routine.  Again, that code is
in the wrong place.


I'm going to clip the rest of this commit, since the above invalidates
it.

Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-07 10:03             ` Ævar Arnfjörð Bjarmason
@ 2022-10-10 19:16               ` Jeff Hostetler
  2022-10-11 13:31                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-10 19:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano
  Cc: git, Jeff Hostetler, Jeff Hostetler via GitGitGadget



On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Oct 06 2022, Junio C Hamano wrote:
> 
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> A cleaned up version of the test code I had on top of "master", RFC
>>> because I may still be missing some context here. E.g. maybe there's a
>>> plan to dynamically construct these thread names?
>>
>> That's nice to learn, indeed.
>>
>>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>>> +			     int thread_id)
>>> +{
>>> +	object_common(jw, "thread");
>>> +	strbuf_addch(&jw->json, '"');
>>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>>> +	strbuf_addch(&jw->json, '"');
>>> +}
>>
>> ...
>>
>>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>>   	}
>>>   
>>>   	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>>> -		    event_name);
>>> +	oldlen = buf->len;
>>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>>
>> Having to do strbuf_addf() many times may negatively affect perf_*
>> stuff, if this code is invoked in the hot path.  I however tend to
>> treat anything that involves an I/O not performance critical, and
>> this certainly falls into that category.
> 
> Yes, and that function already called strbuf_addf() 5-7 times, this adds
> one more, but only if "thread_id" is > 0.
> 
> The reason I added jw_object_string_thread() was to avoid the malloc() &
> free() of a temporary "struct strbuf", it would have been more
> straightforward to call jw_object_string() like that.
> 
> I don't think anyone cares about the raw performance of the "perf"
> output, but the "JSON" one needs to be fast(er).
> 
> But even that output will malloc()/free() for each line it emits, and
> often multiple times within one line (e.g. each time we format a
> double).
> 
> So if we do want to optimize this in terms of memory use the lowest
> hanging fruit seems to be to just have a per-thread "scratch" buffer
> we'd write to, we could also observe that we're writing to a file and
> just directly write to it in most cases (although we'd need to be
> careful to write partial-and-still-invalid JSON lines in that case...).
> 

WRT optimizing memory usage.  We're talking about ~25 byte buffer
per thread.  Most commands execute in 1 thread -- if they read the
index they may have ~10 threads (depending on the size of the index
and if preload-index is enabled).  So, I don't think we really need
to optimize this.  Threading is used extensively in fsmonitor-daemon,
but it creates a fixed thread-pool at startup, so it may have ~12
threads.  Again, not worth optimizing for the thread-name field.

Now, if you want to optimize over all trace2 events (a completely
different topic), you could create a large scratch strbuf buffer in
each thread context and use it so that we don't have to malloc/free
during each trace message.  That might be worth while.


We must not do partial writes to the trace2 files as we're
constructing fields.  The trace2 files are opened with O_APPEND
so that we get the atomic lseek(2)+write(2) so that lines get
written without overwrites when multiple threads and/or processes
are tracing.

Also, when writing to a named pipe, we get "message" semantics
on write() boundaries, which makes post-processing easier.

Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-10 19:05           ` Jeff Hostetler
@ 2022-10-11 12:52             ` Ævar Arnfjörð Bjarmason
  2022-10-11 14:40               ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-11 12:52 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, Junio C Hamano, Jeff Hostetler,
	Jeff Hostetler via GitGitGadget


On Mon, Oct 10 2022, Jeff Hostetler wrote:

> On 10/6/22 9:10 PM, Ævar Arnfjörð Bjarmason wrote:
>> Since ee4512ed481 (trace2: create new combined trace facility,
>> 2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has
>> been copied from the caller, but those callers have always passed a
>> constant string:
>> 	$ git -P grep '^\s*trace2_thread_start\('
>> 	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
>> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
>> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
>> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
>> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
>> 	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
>> 	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
>> 	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");
>> This isn't needed for optimization, but apparently[1] there's been
>> some confusion about the non-const-ness of the previous "struct
>> strbuf".
>> Using the caller's string here makes this more straightforward, as
>> it's now clear that we're not dynamically constructing these. It's
>> also what the progress API does with its "title" string.
>> Since we know we're hardcoding these thread names let's BUG() out
>> when
>> we see that the length of the name plus the length of the prefix would
>> exceed the maximum length for the "perf" format.
>> 1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> PLEASE DON'T DO THIS.
>
> If you don't like my patch, fine.  Let's discuss it.  But DON'T submit
> a new one to replace it.  Or worse, try to inject it into the middle
> of an existing series.

I'm not seeking to replace your series, or to tick you off, sorry if it
came across like that.

I just thought (and still think) that we were at a point in the
discussion where it seemed clear that I wasn't quite managing to get
across to you what I meant, so sending that in the form of working code
should clarify things.

Per Junio's "That's nice to learn, indeed." in
<xmqqo7uoh1q0.fsf@gitster.g> it seems to have had that intended effect
on him. It's marked as an RFC, so not-a-thing-to-pick-up, but just for
discussion.

> Yes, current callers are passing a string literal and thread-start
> could take a "const char*" to it, but there is no way to guarantee
> that that is safe if someone decides to dynamically construct their
> thread-name and pass it in (since we don't know the lifetime of that
> pointer).  So it is safer to copy it into the thread context so that
> it can be used by later trace messages.

I think that's a defensible opinion, but I also think it's fair to say
that:

 * This seems to be *the* motivation for why you're doing things the way
   you're doing them, and at least to this reviewer that wasn't really
   coming across...

 * ...nor the context of why we'd need that sort of guarded API in this
   case, but not e.g. for another widely-used API like start_progress().

   See 791afae2924 (progress.c tests: make start/stop commands on stdin,
   2022-02-03) for a case where we're using that where we need to work
   around its behavior (and no, I didn't make the underlying API that
   way, it's just a commit of mine where I'm having to work with it).

I think designing our internal APIs to not be quite so guarded is fine,
and we do that in various other contexts (progress, etc.). We control
both the API and its users, so just leaving a "this must be a constant"
should be enough.

But even if you want to be paranoid about it there's much easier ways to
do that which give you more of the safety you seem to want. E.g. this on
top of master (and easily adjusted on top of this RFC patch):
	
	diff --git a/trace2.h b/trace2.h
	index 88d906ea830..1c3a98fb30f 100644
	--- a/trace2.h
	+++ b/trace2.h
	@@ -306,12 +306,18 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
	  *
	  * Thread names should be descriptive, like "preload_index".
	  * Thread names will be decorated with an instance number automatically.
	+ * Thread names must point to data that won't change after it's passed
	+ * into this function. Once trace2_thread_exit() is called it can be
	+ * free'd.
	  */
	 void trace2_thread_start_fl(const char *file, int line,
	 			    const char *thread_name);
	 
	+/*
	+ * The "" is to assure us that API users pass only constant strings
	+ */
	 #define trace2_thread_start(thread_name) \
	-	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name))
	+	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name ""))
	 
	 /*
	  * Emit a 'thread_exit' event.  This must be called from inside the

Will pass, as we only pass it constant strings, but if someone were to
pass a variable it'll blow up, at which point we could provide some
inline macro/function that would do the required xstrdup().

All of which I think is *still* being too paranoid, but which I think
*if* you want the paranoia is much more explicit about what we're trying
to accomplish with said paranoida, and where the compiler will help you.

> [...]
>> +void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name,
>> +			       int thread_id);
>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>> +			     int thread_id);
>
> This violates a separation of concerns.  json-writer is ONLY concerned
> with formatting valid JSON from basic data types.  It does not know
> about threads or thread contexts.
>
> `js_strbuf_add_thread_name()` also violates the json-writer conventions
> -- that it takes a "struct json_writer *" pointer.  There is nothing
> about JSON here.
>
> You might write a helper (inside of tr2_tgt_event.c) that formats a
> thread-name from the id and hint, but that is specific to the Event
> target -- not to JSON, nor the JSON writer.

That's fair, more on that below.

> But then again, why make every trace message from every target format
> that "th%0d:%s" when we could save some time and format it in the
> thread-start and just USE it.

If you actually care about this being fasterer -- and only reason for
posting this RFC patch is to try to tease out *why* that is -- then this
part of your concern can be trivially mitigated with having a struct
member like:

	char thread_id_str[3];

We'd then just snprintf() into that in tr2tls_create_self(). Then when
we print the thread to the JSON or log you'd do so without any
strbuf_addf(), just a strbuf_addstr() or strbuf_add().

I think that micro-optimization isn't needed in this case, but it *is*
easy to do .

> [...]
>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>   	}
>>     	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>> -		    event_name);
>> +	oldlen = buf->len;
>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>
> This stands out as very wrong.  The _Perf target does not use JSON
> at all, yet here we are calling a jw_ routine.  Again, that code is
> in the wrong place.
>
> I'm going to clip the rest of this commit, since the above invalidates
> it.

A helper function being in the wrong place invalidates the whole commit?

I think you're right that this jw_strbuf_add_thread_name() helper should
live somewhere else, probably in thread-utils.c.

So, pretending that it's in whatever place you'd be comfortable with,
and using whatever naming convention you'd prefer. What do you think
about the rest of the commit?

You snippet it just as you were getting to the meaty part of it, namely:

 * With this approach we can BUG() out as soon as we try to construct
   the main thread if its name is bad, we don't need to wait until
   runtime when a child thread runs into the limit.

 * We no longer need the whole thread-creation-time string duplication,
   associated storage in the struct etc.

 * That struct member is "const", addresing your initial concern of
   (from the upthread commit message):

	Using a (non-const) `strbuf` structure for it caused some
	confusion in the past because it implied that someone could
	rename a thread after it was created.  That usage was not
	intended.

   Although I think (and I'm possibly misreading it) that your
   commentary here is saying that even that's not enough, i.e. we can't
   just leave it at a "const" here, but must assume that an API user
   will disregard that and modify it after it's passed to us anyway.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-10 19:16               ` Jeff Hostetler
@ 2022-10-11 13:31                 ` Ævar Arnfjörð Bjarmason
  2022-10-12 13:31                   ` Jeff Hostetler
  0 siblings, 1 reply; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-11 13:31 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, git, Jeff Hostetler,
	Jeff Hostetler via GitGitGadget


On Mon, Oct 10 2022, Jeff Hostetler wrote:

> On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Oct 06 2022, Junio C Hamano wrote:
>> 
>>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>>
>>>> A cleaned up version of the test code I had on top of "master", RFC
>>>> because I may still be missing some context here. E.g. maybe there's a
>>>> plan to dynamically construct these thread names?
>>>
>>> That's nice to learn, indeed.
>>>
>>>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>>>> +			     int thread_id)
>>>> +{
>>>> +	object_common(jw, "thread");
>>>> +	strbuf_addch(&jw->json, '"');
>>>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>>>> +	strbuf_addch(&jw->json, '"');
>>>> +}
>>>
>>> ...
>>>
>>>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>>>   	}
>>>>     	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>>>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>>>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>>>> -		    event_name);
>>>> +	oldlen = buf->len;
>>>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>>>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>>>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>>>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>>>
>>> Having to do strbuf_addf() many times may negatively affect perf_*
>>> stuff, if this code is invoked in the hot path.  I however tend to
>>> treat anything that involves an I/O not performance critical, and
>>> this certainly falls into that category.
>> Yes, and that function already called strbuf_addf() 5-7 times, this
>> adds
>> one more, but only if "thread_id" is > 0.
>> The reason I added jw_object_string_thread() was to avoid the
>> malloc() &
>> free() of a temporary "struct strbuf", it would have been more
>> straightforward to call jw_object_string() like that.
>> I don't think anyone cares about the raw performance of the "perf"
>> output, but the "JSON" one needs to be fast(er).
>> But even that output will malloc()/free() for each line it emits,
>> and
>> often multiple times within one line (e.g. each time we format a
>> double).
>> So if we do want to optimize this in terms of memory use the lowest
>> hanging fruit seems to be to just have a per-thread "scratch" buffer
>> we'd write to, we could also observe that we're writing to a file and
>> just directly write to it in most cases (although we'd need to be
>> careful to write partial-and-still-invalid JSON lines in that case...).
>> 

I left more extensive commentary in the side-thread in
https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@evledraar.gmail.com/,
just a quick reply here.

> WRT optimizing memory usage.  We're talking about ~25 byte buffer
> per thread.  Most commands execute in 1 thread -- if they read the
> index they may have ~10 threads (depending on the size of the index
> and if preload-index is enabled).  So, I don't think we really need
> to optimize this.  Threading is used extensively in fsmonitor-daemon,
> but it creates a fixed thread-pool at startup, so it may have ~12
> threads.  Again, not worth optimizing for the thread-name field.

Yes, I agree it's not worth optimizing.

The reason for commenting on this part is that it isn't clear to me why
your proposed patch then isn't doing the more obvious "it's not worth
optimizing" pattern, per Junio's [1] comment on the initial version.

The "flex array" method is seemingly taking pains to reduce the runtime
memory use of these by embedding this string in the space reserved for
the struct.

So it's just meant as a question for you & the proposed patch.

> Now, if you want to optimize over all trace2 events (a completely
> different topic), you could create a large scratch strbuf buffer in
> each thread context and use it so that we don't have to malloc/free
> during each trace message.  That might be worth while.

*nod*

> We must not do partial writes to the trace2 files as we're
> constructing fields.  The trace2 files are opened with O_APPEND
> so that we get the atomic lseek(2)+write(2) so that lines get
> written without overwrites when multiple threads and/or processes
> are tracing.
>
> Also, when writing to a named pipe, we get "message" semantics
> on write() boundaries, which makes post-processing easier.

*nod*

1. https://lore.kernel.org/git/xmqq8rwcjttq.fsf@gitster.g/
2. https://lore.kernel.org/git/RFC-patch-1.1-8563d017137-20221007T010829Z-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-11 12:52             ` Ævar Arnfjörð Bjarmason
@ 2022-10-11 14:40               ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2022-10-11 14:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff Hostetler, git, Jeff Hostetler,
	Jeff Hostetler via GitGitGadget

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Per Junio's "That's nice to learn, indeed." in
> <xmqqo7uoh1q0.fsf@gitster.g> it seems to have had that intended effect
> on him.

I was commenting on the goal, i.e. you "may still be missing some
context here, maybe there's a plan to ...", and I meant that the
plan of the overall effort is something that is nice to learn before
going further.  I was not endorsing the method you are taking to
achieve that goal, though.

FWIW, I find my code sent in as a comment easier to read than my
prose alone for any topic, but that is only because it is "my" code
is easy to read for "me".  I am sure others would find it
unnecessary burden to figure out what the alternative/replacement I
send out intends to do and why it does so in the way it does, and
would rather appreciate if I explained these things in prose that is
easy to understand and rich in "why", which alternative/replacement
code would solely lack.  Code snippet helps illustrate points on
"how", but is often a poor replacement for proper explanation
because it is a bad medium to convey "why".

Same would apply to your code.  For others, including me, it often
is a lot of work to figure out what your code is trying to do, and
more importantly why it does what it tries to do in the way it does.

After all, when you are having hard time communicating why you want
to do things differently from the patch author in prose, code
snippet would probably be the worst primary medium to do so, because
it is full of "how exactly" with little "why".



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH] trace2 API: don't save a copy of constant "thread_name"
  2022-10-11 13:31                 ` Ævar Arnfjörð Bjarmason
@ 2022-10-12 13:31                   ` Jeff Hostetler
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-12 13:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Jeff Hostetler,
	Jeff Hostetler via GitGitGadget



On 10/11/22 9:31 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Oct 10 2022, Jeff Hostetler wrote:
> 
>> On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
>>> On Thu, Oct 06 2022, Junio C Hamano wrote:
>>>
>>>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>>>
>>>>> A cleaned up version of the test code I had on top of "master", RFC
>>>>> because I may still be missing some context here. E.g. maybe there's a
>>>>> plan to dynamically construct these thread names?
>>>>
[...]

> I left more extensive commentary in the side-thread in
> https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@evledraar.gmail.com/,
> just a quick reply here.
> 
>> WRT optimizing memory usage.  We're talking about ~25 byte buffer
>> per thread.  Most commands execute in 1 thread -- if they read the
>> index they may have ~10 threads (depending on the size of the index
>> and if preload-index is enabled).  So, I don't think we really need
>> to optimize this.  Threading is used extensively in fsmonitor-daemon,
>> but it creates a fixed thread-pool at startup, so it may have ~12
>> threads.  Again, not worth optimizing for the thread-name field.
> 
> Yes, I agree it's not worth optimizing.
> 
> The reason for commenting on this part is that it isn't clear to me why
> your proposed patch then isn't doing the more obvious "it's not worth
> optimizing" pattern, per Junio's [1] comment on the initial version.
> 
> The "flex array" method is seemingly taking pains to reduce the runtime
> memory use of these by embedding this string in the space reserved for
> the struct.
> 
> So it's just meant as a question for you & the proposed patch.

I think we're converging on some common understanding (and I
think we've gone around on this topic more than enough).  :-)

I really was just trying to get rid of the strbuf and make it
a fixed string -- I chose a flex-array rather than just detaching
the buffer from a local in the thread-start code.  I should have
done the latter.  I saw the flex-array as a fixed-size object
that can't be replaced or extended (without recreating the
thread-local storage) -- yes, people could overwrite existing
bytes in-place in the flex-array, but who does that??


I understood what you were asking (illustrated in your RFC).
That is, I understood the "what/how" you wanted to do to refactor /
redesign the field, but I couldn't understand the "why".  That
is, why you've taken such interest in this field (and such
a relatively unimportant change).  We've spent nearly a week
discussing it and we both agree that the optimization that I
didn't suggest isn't worth doing.  (I'm paraphrasing slightly.) :-)

So, rather than continuing with the back-n-forth, let me skip
over the remaining questions in this thread and prepare a re-roll.
Hopefully, I can simplify and more clearly explain the method to
my madness and we can move on.


>> Now, if you want to optimize over all trace2 events (a completely
>> different topic), you could create a large scratch strbuf buffer in
>> each thread context and use it so that we don't have to malloc/free
>> during each trace message.  That might be worth while.
> 
> *nod*

I'll make a note to revisit this idea in a future series.

Thanks
Jeff


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 0/7] Trace2 timers and counters and some cleanup
  2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                   ` (10 preceding siblings ...)
  2022-10-06 18:12 ` Derrick Stolee
@ 2022-10-12 18:52 ` Jeff Hostetler via GitGitGadget
  2022-10-12 18:52   ` [PATCH v2 1/7] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
                     ` (7 more replies)
  11 siblings, 8 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

Here is version 2 of this series to add timers and counters to Trace2.

Changes since V1:

 * I dropped the commits concerning compiler errors in Clang 11.0.0 on
   MacOS. I've sent them to the mailing list in a separate series, since
   they had nothing to do with the main topic of this series.

 * I moved the documentation changes earlier in the series to get it out of
   the way (and eliminate the need to update it later commits).

 * After a long conversation on the mailing list, I redid the two
   thread-name commits to simplify and hopefully eliminate the remaining
   misunderstandings and/or short-comings of my previous attempt and
   explanations. We now use a "const char *" for the field in the thread-ctx
   that we format and detach from a strbuf during thread-start. The goal
   here is to move away from a modifyable strbuf in the thread-ctx itself
   (to avoid giving the appearance that a caller could modify the
   thread-name at some point, when that was not intended).

The last 2 commits add the stopwatch timers and the global counters and are
unchanged from the previous version.

Jeff Hostetler (7):
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  tr2tls: clarify TLS terminology
  api-trace2.txt: elminate section describing the public trace2 API
  trace2: rename the thread_name argument to trace2_thread_start
  trace2: convert ctx.thread_name from strbuf to pointer
  trace2: add stopwatch timers
  trace2: add global counter mechanism

 Documentation/technical/api-trace2.txt | 190 +++++++++++++++++--------
 Makefile                               |   2 +
 t/helper/test-trace2.c                 | 187 ++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  95 +++++++++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               | 121 +++++++++++++++-
 trace2.h                               | 101 +++++++++++--
 trace2/tr2_ctr.c                       | 101 +++++++++++++
 trace2/tr2_ctr.h                       | 104 ++++++++++++++
 trace2/tr2_tgt.h                       |  14 ++
 trace2/tr2_tgt_event.c                 |  47 +++++-
 trace2/tr2_tgt_normal.c                |  39 +++++
 trace2/tr2_tgt_perf.c                  |  43 +++++-
 trace2/tr2_tls.c                       |  34 +++--
 trace2/tr2_tls.h                       |  55 ++++---
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 ++++++++++++++++++
 17 files changed, 1359 insertions(+), 102 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h


base-commit: 3dcec76d9df911ed8321007b1d197c1a206dc164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1373%2Fjeffhostetler%2Ftrace2-stopwatch-v4-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1373/jeffhostetler/trace2-stopwatch-v4-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1373

Range-diff vs v1:

  1:  870f29166ea <  -:  ----------- builtin/merge-file: fix compiler warning on MacOS with clang 11.0.0
  2:  43c41f7035d <  -:  ----------- builtin/unpack-objects.c: fix compiler warning on MacOS with clang 11.0.0
  3:  73704b6f660 =  1:  6e7e4f3187e trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  4:  7123886a804 =  2:  9dee7a75903 tr2tls: clarify TLS terminology
  7:  77a4daf9a4b !  3:  804dab9e1a7 api-trace2.txt: elminate section describing the public trace2 API
     @@ Documentation/technical/api-trace2.txt: take a `va_list` argument.
      -
      -These messages are concerned with Git thread usage.
      -
     --e.g: `void trace2_thread_start(const char *name_hint)`.
     +-e.g: `void trace2_thread_start(const char *thread_name)`.
      -
      -=== Region and Data Messages
      -
  5:  82f1672e180 !  4:  637b422b860 trace2: rename trace2 thread_name argument as name_hint
     @@ Metadata
      Author: Jeff Hostetler <jeffhost@microsoft.com>
      
       ## Commit message ##
     -    trace2: rename trace2 thread_name argument as name_hint
     +    trace2: rename the thread_name argument to trace2_thread_start
      
     -    Rename the `thread_name` argument in `tr2tls_create_self()`
     -    and `trace2_thread_start()` to be `name_hint` to make it clear
     -    that the passed argument is a hint that will be used to create
     +    Rename the `thread_name` argument in `tr2tls_create_self()` and
     +    `trace2_thread_start()` to be `thread_base_name` to make it clearer
     +    that the passed argument is a component used in the construction of
          the actual `struct tr2tls_thread_ctx.thread_name` variable.
      
     -    This should make it clearer in the API that the trace2 layer
     -    does not borrow the caller's string pointer/buffer, but rather
     -    that it will use that hint in formatting the actual thread's
     -    name.  Previous discussion on the mailing list indicated that
     -    there was confusion about this point.
     +    The base name will be used along with the thread id to create a
     +    unique thread name.
      
          This commit does not change how the `thread_name` field is
          allocated or stored within the `tr2tls_thread_ctx` structure.
      
          Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
      
     - ## Documentation/technical/api-trace2.txt ##
     -@@ Documentation/technical/api-trace2.txt: e.g: `void trace2_child_start(struct child_process *cmd)`.
     - 
     - These messages are concerned with Git thread usage.
     - 
     --e.g: `void trace2_thread_start(const char *thread_name)`.
     -+e.g: `void trace2_thread_start(const char *name_hint)`.
     - 
     - === Region and Data Messages
     - 
     -
       ## trace2.c ##
      @@ trace2.c: void trace2_exec_result_fl(const char *file, int line, int exec_id, int code)
       				file, line, us_elapsed_absolute, exec_id, code);
       }
       
      -void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
     -+void trace2_thread_start_fl(const char *file, int line, const char *name_hint)
     ++void trace2_thread_start_fl(const char *file, int line, const char *thread_base_name)
       {
       	struct tr2_tgt *tgt_j;
       	int j;
     @@ trace2.c: void trace2_thread_start_fl(const char *file, int line, const char *th
       		trace2_region_enter_printf_fl(file, line, NULL, NULL, NULL,
       					      "thread-proc on main: %s",
      -					      thread_name);
     -+					      name_hint);
     ++					      thread_base_name);
       		return;
       	}
       
     @@ trace2.c: void trace2_thread_start_fl(const char *file, int line, const char *th
       	us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
       
      -	tr2tls_create_self(thread_name, us_now);
     -+	tr2tls_create_self(name_hint, us_now);
     ++	tr2tls_create_self(thread_base_name, us_now);
       
       	for_each_wanted_builtin (j, tgt_j)
       		if (tgt_j->pfn_thread_start_fl)
     @@ trace2.h: void trace2_exec_result_fl(const char *file, int line, int exec_id, in
        *
      - * Thread names should be descriptive, like "preload_index".
      - * Thread names will be decorated with an instance number automatically.
     -+ * The thread name hint should be descriptive, like "preload_index" or
     ++ * The thread base name should be descriptive, like "preload_index" or
      + * taken from the thread-proc function.  A unique thread name will be
     -+ * created from the hint and the thread id automatically.
     ++ * created from the given base name and the thread id automatically.
        */
       void trace2_thread_start_fl(const char *file, int line,
      -			    const char *thread_name);
     -+			    const char *name_hint);
     ++			    const char *thread_base_name);
       
      -#define trace2_thread_start(thread_name) \
      -	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name))
     -+#define trace2_thread_start(name_hint) \
     -+	trace2_thread_start_fl(__FILE__, __LINE__, (name_hint))
     ++#define trace2_thread_start(thread_base_name) \
     ++	trace2_thread_start_fl(__FILE__, __LINE__, (thread_base_name))
       
       /*
        * Emit a 'thread_exit' event.  This must be called from inside the
     @@ trace2/tr2_tls.c: void tr2tls_start_process_clock(void)
       }
       
      -struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
     -+struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
     ++struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
       					     uint64_t us_thread_start)
       {
       	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
     @@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *threa
       	if (ctx->thread_id)
       		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
      -	strbuf_addstr(&ctx->thread_name, thread_name);
     -+	strbuf_addstr(&ctx->thread_name, name_hint);
     ++	strbuf_addstr(&ctx->thread_name, thread_base_name);
       	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
       		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
       
     @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
      + * The first thread in the process will have:
      + *     { .thread_id=0, .thread_name="main" }
      + * Subsequent threads are given a non-zero thread_id and a thread_name
     -+ * constructed from the id and a "name hint" (which is usually based
     -+ * upon the name of the thread-proc function).  For example:
     ++ * constructed from the id and a thread base name (which is usually just
     ++ * the name of the thread-proc function).  For example:
      + *     { .thread_id=10, .thread_name="th10fsm-listen" }
      + * This helps to identify and distinguish messages from concurrent threads.
      + * The ctx.thread_name field is truncated if necessary to help with column
     @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
        * current thread.
        */
      -struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
     -+struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
     ++struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
       					     uint64_t us_thread_start);
       
       /*
  6:  6492b6d2b98 !  5:  4bf78e356e2 trace2: convert ctx.thread_name to flex array
     @@ Metadata
      Author: Jeff Hostetler <jeffhost@microsoft.com>
      
       ## Commit message ##
     -    trace2: convert ctx.thread_name to flex array
     +    trace2: convert ctx.thread_name from strbuf to pointer
      
          Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
     -    to a "flex array" at the end of the context structure.
     +    to a "const char*" pointer.
      
          The `thread_name` field is a constant string that is constructed when
          the context is created.  Using a (non-const) `strbuf` structure for it
          caused some confusion in the past because it implied that someone
          could rename a thread after it was created.  That usage was not
     -    intended.  Changing it to a "flex array" will hopefully make the
     -    intent more clear.
     -
     -    Also, move the maximum thread_name truncation to tr2_tgt_perf.c
     -    because it is the only target that needs to worry about output column
     -    alignment.
     +    intended.  Change it to a const pointer to make the intent more clear.
      
          Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
      
     @@ trace2/tr2_tgt_event.c: static void event_fmt_prepare(const char *event_name, co
       	 * In brief mode, only emit <time> on these 2 event types.
      
       ## trace2/tr2_tgt_perf.c ##
     -@@ trace2/tr2_tgt_perf.c: static int tr2env_perf_be_brief;
     - 
     - #define TR2FMT_PERF_FL_WIDTH (28)
     - #define TR2FMT_PERF_MAX_EVENT_NAME (12)
     -+#define TR2FMT_PERF_MAX_THREAD_NAME (24)
     - #define TR2FMT_PERF_REPO_WIDTH (3)
     - #define TR2FMT_PERF_CATEGORY_WIDTH (12)
     - 
      @@ trace2/tr2_tgt_perf.c: static void perf_fmt_prepare(const char *event_name,
     - 	}
       
       	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
     --	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
     + 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
      -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
     -+	strbuf_addf(buf, "%-*.*s | %-*s | ",
     -+		    TR2FMT_PERF_MAX_THREAD_NAME,
     -+		    TR2FMT_PERF_MAX_THREAD_NAME,
     -+		    ctx->thread_name,
     -+		    TR2FMT_PERF_MAX_EVENT_NAME,
     ++		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
       		    event_name);
       
       	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
      
       ## trace2/tr2_tls.c ##
     -@@ trace2/tr2_tls.c: void tr2tls_start_process_clock(void)
     - struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
     +@@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
       					     uint64_t us_thread_start)
       {
     --	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
     -+	struct tr2tls_thread_ctx *ctx;
     -+	struct strbuf buf_name = STRBUF_INIT;
     -+	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
     -+
     -+	if (thread_id)
     -+		strbuf_addf(&buf_name, "th%02d:", thread_id);
     -+	strbuf_addstr(&buf_name, name_hint);
     -+
     -+	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
     -+	strbuf_release(&buf_name);
     -+
     -+	ctx->thread_id = thread_id;
     + 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
     ++	struct strbuf buf = STRBUF_INIT;
       
       	/*
       	 * Implicitly "tr2tls_push_self()" to capture the thread's start
     -@@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *name_hint,
     - 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
     - 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
     +@@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
     + 
     + 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
       
     --	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
     --
      -	strbuf_init(&ctx->thread_name, 0);
     --	if (ctx->thread_id)
     ++	strbuf_init(&buf, 0);
     + 	if (ctx->thread_id)
      -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
     --	strbuf_addstr(&ctx->thread_name, name_hint);
     +-	strbuf_addstr(&ctx->thread_name, thread_base_name);
      -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
      -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
     --
     ++		strbuf_addf(&buf, "th%02d:", ctx->thread_id);
     ++	strbuf_addstr(&buf, thread_base_name);
     ++	if (buf.len > TR2_MAX_THREAD_NAME)
     ++		strbuf_setlen(&buf, TR2_MAX_THREAD_NAME);
     ++	ctx->thread_name = strbuf_detach(&buf, NULL);
     + 
       	pthread_setspecific(tr2tls_key, ctx);
       
     - 	return ctx;
      @@ trace2/tr2_tls.c: void tr2tls_unset_self(void)
       
       	pthread_setspecific(tr2tls_key, NULL);
       
      -	strbuf_release(&ctx->thread_name);
     ++	free((char *)ctx->thread_name);
       	free(ctx->array_us_start);
       	free(ctx);
       }
     @@ trace2/tr2_tls.c: void tr2tls_pop_self(void)
      
       ## trace2/tr2_tls.h ##
      @@
     -  * There is NO relation to "transport layer security".
     -  */
     + #define TR2_MAX_THREAD_NAME (24)
       
     --/*
     -- * Arbitry limit for thread names for column alignment.
     -- */
     --#define TR2_MAX_THREAD_NAME (24)
     --
       struct tr2tls_thread_ctx {
      -	struct strbuf thread_name;
     ++	const char *thread_name;
       	uint64_t *array_us_start;
       	size_t alloc;
       	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
     - 	int thread_id;
     -+	char thread_name[FLEX_ARRAY];
     - };
     - 
     - /*
     -@@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
     -  * upon the name of the thread-proc function).  For example:
     -  *     { .thread_id=10, .thread_name="th10fsm-listen" }
     -  * This helps to identify and distinguish messages from concurrent threads.
     -- * The ctx.thread_name field is truncated if necessary to help with column
     -- * alignment in printf-style messages.
     -  *
     -  * In this and all following functions the term "self" refers to the
     -  * current thread.
  8:  19c7bba91ba !  6:  dd6d8e2841b trace2: add stopwatch timers
     @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
      +	struct tr2_timer_block timer_block;
      +	unsigned int used_any_timer:1;
      +	unsigned int used_any_per_thread_timer:1;
     - 	char thread_name[FLEX_ARRAY];
       };
       
     + /*
      @@ trace2/tr2_tls.h: int tr2tls_locked_increment(int *p);
        */
       void tr2tls_start_process_clock(void);
  9:  2bf7cb1f8d0 !  7:  cf012fcde37 trace2: add global counter mechanism
     @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
       	unsigned int used_any_per_thread_timer:1;
      +	unsigned int used_any_counter:1;
      +	unsigned int used_any_per_thread_counter:1;
     - 	char thread_name[FLEX_ARRAY];
       };
       
     + /*

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 1/7] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
@ 2022-10-12 18:52   ` Jeff Hostetler via GitGitGadget
  2022-10-12 18:52   ` [PATCH v2 2/7] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..a90bd639d48 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -11,8 +11,8 @@
 struct tr2tls_thread_ctx {
 	struct strbuf thread_name;
 	uint64_t *array_us_start;
-	int alloc;
-	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
+	size_t alloc;
+	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 2/7] tr2tls: clarify TLS terminology
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
  2022-10-12 18:52   ` [PATCH v2 1/7] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2022-10-12 18:52   ` Jeff Hostetler via GitGitGadget
  2022-10-13 21:12     ` Junio C Hamano
  2022-10-12 18:52   ` [PATCH v2 3/7] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Reduce or eliminate use of the term "TLS" in the Trace2 code.

The term "TLS" has two popular meanings: "thread-local storage" and
"transport layer security".  In the Trace2 source, the term is associated
with the former.  There was concern on the mailing list about it refering
to the latter.

Update the source and documentation to eliminate the use of the "TLS" term
or replace it with the phrase "thread-local storage" to reduce ambiguity.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  8 ++++----
 trace2.c                               |  2 +-
 trace2.h                               | 10 +++++-----
 trace2/tr2_tls.c                       |  6 +++---
 trace2/tr2_tls.h                       | 18 +++++++++++-------
 5 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 2afa28bb5aa..431d424f9d5 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -685,8 +685,8 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_start"`::
 	This event is generated when a thread is started.  It is
-	generated from *within* the new thread's thread-proc (for TLS
-	reasons).
+	generated from *within* the new thread's thread-proc (because
+	it needs to access data in the thread's thread-local storage).
 +
 ------------
 {
@@ -698,7 +698,7 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_exit"`::
 	This event is generated when a thread exits.  It is generated
-	from *within* the thread's thread-proc (for TLS reasons).
+	from *within* the thread's thread-proc.
 +
 ------------
 {
@@ -1206,7 +1206,7 @@ worked on 508 items at offset 2032.  Thread "th04" worked on 508 items
 at offset 508.
 +
 This example also shows that thread names are assigned in a racy manner
-as each thread starts and allocates TLS storage.
+as each thread starts.
 
 Config (def param) Events::
 
diff --git a/trace2.c b/trace2.c
index 0c0a11e07d5..c1244e45ace 100644
--- a/trace2.c
+++ b/trace2.c
@@ -52,7 +52,7 @@ static struct tr2_tgt *tr2_tgt_builtins[] =
  * Force (rather than lazily) initialize any of the requested
  * builtin TRACE2 targets at startup (and before we've seen an
  * actual TRACE2 event call) so we can see if we need to setup
- * the TR2 and TLS machinery.
+ * private data structures and thread-local storage.
  *
  * Return the number of builtin targets enabled.
  */
diff --git a/trace2.h b/trace2.h
index 88d906ea830..af3c11694cc 100644
--- a/trace2.h
+++ b/trace2.h
@@ -73,8 +73,7 @@ void trace2_initialize_clock(void);
 /*
  * Initialize TRACE2 tracing facility if any of the builtin TRACE2
  * targets are enabled in the system config or the environment.
- * This includes setting up the Trace2 thread local storage (TLS).
- * Emits a 'version' message containing the version of git
+ * This emits a 'version' message containing the version of git
  * and the Trace2 protocol.
  *
  * This function should be called from `main()` as early as possible in
@@ -302,7 +301,8 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
 
 /*
  * Emit a 'thread_start' event.  This must be called from inside the
- * thread-proc to set up the trace2 TLS data for the thread.
+ * thread-proc to allow the thread to create its own thread-local
+ * storage.
  *
  * Thread names should be descriptive, like "preload_index".
  * Thread names will be decorated with an instance number automatically.
@@ -315,8 +315,8 @@ void trace2_thread_start_fl(const char *file, int line,
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
- * thread-proc to report thread-specific data and cleanup TLS data
- * for the thread.
+ * thread-proc so that the thread can access and clean up its
+ * thread-local storage.
  */
 void trace2_thread_exit_fl(const char *file, int line);
 
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..8d2182fbdbb 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -69,9 +69,9 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void)
 	ctx = pthread_getspecific(tr2tls_key);
 
 	/*
-	 * If the thread-proc did not call trace2_thread_start(), we won't
-	 * have any TLS data associated with the current thread.  Fix it
-	 * here and silently continue.
+	 * If the current thread's thread-proc did not call
+	 * trace2_thread_start(), then the thread will not have any
+	 * thread-local storage.  Create it now and silently continue.
 	 */
 	if (!ctx)
 		ctx = tr2tls_create_self("unknown", getnanotime() / 1000);
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index a90bd639d48..1297509fd23 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -3,6 +3,12 @@
 
 #include "strbuf.h"
 
+/*
+ * Notice: the term "TLS" refers to "thread-local storage" in the
+ * Trace2 source files.  This usage is borrowed from GCC and Windows.
+ * There is NO relation to "transport layer security".
+ */
+
 /*
  * Arbitry limit for thread names for column alignment.
  */
@@ -17,9 +23,7 @@ struct tr2tls_thread_ctx {
 };
 
 /*
- * Create TLS data for the current thread.  This gives us a place to
- * put per-thread data, such as thread start time, function nesting
- * and a per-thread label for our messages.
+ * Create thread-local storage for the current thread.
  *
  * We assume the first thread is "main".  Other threads are given
  * non-zero thread-ids to help distinguish messages from concurrent
@@ -35,7 +39,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start);
 
 /*
- * Get our TLS data.
+ * Get the thread-local storage pointer of the current thread.
  */
 struct tr2tls_thread_ctx *tr2tls_get_self(void);
 
@@ -45,7 +49,7 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void);
 int tr2tls_is_main_thread(void);
 
 /*
- * Free our TLS data.
+ * Free the current thread's thread-local storage.
  */
 void tr2tls_unset_self(void);
 
@@ -81,12 +85,12 @@ uint64_t tr2tls_region_elasped_self(uint64_t us);
 uint64_t tr2tls_absolute_elapsed(uint64_t us);
 
 /*
- * Initialize the tr2 TLS system.
+ * Initialize thread-local storage for Trace2.
  */
 void tr2tls_init(void);
 
 /*
- * Free all tr2 TLS resources.
+ * Free all Trace2 thread-local storage resources.
  */
 void tr2tls_release(void);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 3/7] api-trace2.txt: elminate section describing the public trace2 API
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
  2022-10-12 18:52   ` [PATCH v2 1/7] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
  2022-10-12 18:52   ` [PATCH v2 2/7] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
@ 2022-10-12 18:52   ` Jeff Hostetler via GitGitGadget
  2022-10-12 18:52   ` [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Eliminate the mostly obsolete `Public API` sub-section from the
`Trace2 API` section in the documentation.  Strengthen the referral
to `trace2.h`.

Most of the technical information in this sub-section was moved to
`trace2.h` in 6c51cb525d (trace2: move doc to trace2.h, 2019-11-17) to
be adjacent to the function prototypes.  The remaining text wasn't
that useful by itself.

Furthermore, the text would need a bit of overhaul to add routines
that do not immediately generate a message, such as stopwatch timers.
So it seemed simpler to just get rid of it.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 61 +++-----------------------
 1 file changed, 7 insertions(+), 54 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 431d424f9d5..9d43909d068 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -148,20 +148,18 @@ filename collisions).
 
 == Trace2 API
 
-All public Trace2 functions and macros are defined in `trace2.h` and
-`trace2.c`.  All public symbols are prefixed with `trace2_`.
+The Trace2 public API is defined and documented in `trace2.h`; refer to it for
+more information.  All public functions and macros are prefixed
+with `trace2_` and are implemented in `trace2.c`.
 
 There are no public Trace2 data structures.
 
 The Trace2 code also defines a set of private functions and data types
 in the `trace2/` directory.  These symbols are prefixed with `tr2_`
-and should only be used by functions in `trace2.c`.
+and should only be used by functions in `trace2.c` (or other private
+source files in `trace2/`).
 
-== Conventions for Public Functions and Macros
-
-The functions defined by the Trace2 API are declared and documented
-in `trace2.h`.  It defines the API functions and wrapper macros for
-Trace2.
+=== Conventions for Public Functions and Macros
 
 Some functions have a `_fl()` suffix to indicate that they take `file`
 and `line-number` arguments.
@@ -172,52 +170,7 @@ take a `va_list` argument.
 Some functions have a `_printf_fl()` suffix to indicate that they also
 take a `printf()` style format with a variable number of arguments.
 
-There are CPP wrapper macros and `#ifdef`s to hide most of these details.
-See `trace2.h` for more details.  The following discussion will only
-describe the simplified forms.
-
-== Public API
-
-All Trace2 API functions send a message to all of the active
-Trace2 Targets.  This section describes the set of available
-messages.
-
-It helps to divide these functions into groups for discussion
-purposes.
-
-=== Basic Command Messages
-
-These are concerned with the lifetime of the overall git process.
-e.g: `void trace2_initialize_clock()`, `void trace2_initialize()`,
-`int trace2_is_enabled()`, `void trace2_cmd_start(int argc, const char **argv)`.
-
-=== Command Detail Messages
-
-These are concerned with describing the specific Git command
-after the command line, config, and environment are inspected.
-e.g: `void trace2_cmd_name(const char *name)`,
-`void trace2_cmd_mode(const char *mode)`.
-
-=== Child Process Messages
-
-These are concerned with the various spawned child processes,
-including shell scripts, git commands, editors, pagers, and hooks.
-
-e.g: `void trace2_child_start(struct child_process *cmd)`.
-
-=== Git Thread Messages
-
-These messages are concerned with Git thread usage.
-
-e.g: `void trace2_thread_start(const char *thread_name)`.
-
-=== Region and Data Messages
-
-These are concerned with recording performance data
-over regions or spans of code. e.g:
-`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`.
-
-Refer to trace2.h for details about all trace2 functions.
+CPP wrapper macros are defined to hide most of these details.
 
 == Trace2 Target Formats
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-10-12 18:52   ` [PATCH v2 3/7] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
@ 2022-10-12 18:52   ` Jeff Hostetler via GitGitGadget
  2022-10-12 21:06     ` Ævar Arnfjörð Bjarmason
  2022-10-13 21:12     ` Junio C Hamano
  2022-10-12 18:52   ` [PATCH v2 5/7] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
                     ` (3 subsequent siblings)
  7 siblings, 2 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Rename the `thread_name` argument in `tr2tls_create_self()` and
`trace2_thread_start()` to be `thread_base_name` to make it clearer
that the passed argument is a component used in the construction of
the actual `struct tr2tls_thread_ctx.thread_name` variable.

The base name will be used along with the thread id to create a
unique thread name.

This commit does not change how the `thread_name` field is
allocated or stored within the `tr2tls_thread_ctx` structure.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2.c         |  6 +++---
 trace2.h         | 11 ++++++-----
 trace2/tr2_tls.c |  4 ++--
 trace2/tr2_tls.h | 17 ++++++++++-------
 4 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/trace2.c b/trace2.c
index c1244e45ace..165264dc79a 100644
--- a/trace2.c
+++ b/trace2.c
@@ -466,7 +466,7 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code)
 				file, line, us_elapsed_absolute, exec_id, code);
 }
 
-void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
+void trace2_thread_start_fl(const char *file, int line, const char *thread_base_name)
 {
 	struct tr2_tgt *tgt_j;
 	int j;
@@ -488,14 +488,14 @@ void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
 		 */
 		trace2_region_enter_printf_fl(file, line, NULL, NULL, NULL,
 					      "thread-proc on main: %s",
-					      thread_name);
+					      thread_base_name);
 		return;
 	}
 
 	us_now = getnanotime() / 1000;
 	us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
 
-	tr2tls_create_self(thread_name, us_now);
+	tr2tls_create_self(thread_base_name, us_now);
 
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_start_fl)
diff --git a/trace2.h b/trace2.h
index af3c11694cc..74cdb1354f7 100644
--- a/trace2.h
+++ b/trace2.h
@@ -304,14 +304,15 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
  * thread-proc to allow the thread to create its own thread-local
  * storage.
  *
- * Thread names should be descriptive, like "preload_index".
- * Thread names will be decorated with an instance number automatically.
+ * The thread base name should be descriptive, like "preload_index" or
+ * taken from the thread-proc function.  A unique thread name will be
+ * created from the given base name and the thread id automatically.
  */
 void trace2_thread_start_fl(const char *file, int line,
-			    const char *thread_name);
+			    const char *thread_base_name);
 
-#define trace2_thread_start(thread_name) \
-	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name))
+#define trace2_thread_start(thread_base_name) \
+	trace2_thread_start_fl(__FILE__, __LINE__, (thread_base_name))
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 8d2182fbdbb..4f7c516ecb6 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -31,7 +31,7 @@ void tr2tls_start_process_clock(void)
 	tr2tls_us_start_process = getnanotime() / 1000;
 }
 
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
@@ -50,7 +50,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	strbuf_init(&ctx->thread_name, 0);
 	if (ctx->thread_id)
 		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
+	strbuf_addstr(&ctx->thread_name, thread_base_name);
 	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
 		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
 
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 1297509fd23..7d1f03a2ea6 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -25,17 +25,20 @@ struct tr2tls_thread_ctx {
 /*
  * Create thread-local storage for the current thread.
  *
- * We assume the first thread is "main".  Other threads are given
- * non-zero thread-ids to help distinguish messages from concurrent
- * threads.
- *
- * Truncate the thread name if necessary to help with column alignment
- * in printf-style messages.
+ * The first thread in the process will have:
+ *     { .thread_id=0, .thread_name="main" }
+ * Subsequent threads are given a non-zero thread_id and a thread_name
+ * constructed from the id and a thread base name (which is usually just
+ * the name of the thread-proc function).  For example:
+ *     { .thread_id=10, .thread_name="th10fsm-listen" }
+ * This helps to identify and distinguish messages from concurrent threads.
+ * The ctx.thread_name field is truncated if necessary to help with column
+ * alignment in printf-style messages.
  *
  * In this and all following functions the term "self" refers to the
  * current thread.
  */
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start);
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 5/7] trace2: convert ctx.thread_name from strbuf to pointer
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-10-12 18:52   ` [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
@ 2022-10-12 18:52   ` Jeff Hostetler via GitGitGadget
  2022-10-13 21:12     ` Junio C Hamano
  2022-10-12 18:52   ` [PATCH v2 6/7] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
to a "const char*" pointer.

The `thread_name` field is a constant string that is constructed when
the context is created.  Using a (non-const) `strbuf` structure for it
caused some confusion in the past because it implied that someone
could rename a thread after it was created.  That usage was not
intended.  Change it to a const pointer to make the intent more clear.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  |  2 +-
 trace2/tr2_tls.c       | 16 +++++++++-------
 trace2/tr2_tls.h       |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 37a3163be12..52f9356c695 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -90,7 +90,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 8cb792488c8..59ca58f862d 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -108,7 +108,7 @@ static void perf_fmt_prepare(const char *event_name,
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
+		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
 		    event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 4f7c516ecb6..3a67532aae4 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
+	struct strbuf buf = STRBUF_INIT;
 
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
@@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 
 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
-	strbuf_init(&ctx->thread_name, 0);
+	strbuf_init(&buf, 0);
 	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_base_name);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
+		strbuf_addf(&buf, "th%02d:", ctx->thread_id);
+	strbuf_addstr(&buf, thread_base_name);
+	if (buf.len > TR2_MAX_THREAD_NAME)
+		strbuf_setlen(&buf, TR2_MAX_THREAD_NAME);
+	ctx->thread_name = strbuf_detach(&buf, NULL);
 
 	pthread_setspecific(tr2tls_key, ctx);
 
@@ -95,7 +97,7 @@ void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
+	free((char *)ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +115,7 @@ void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 7d1f03a2ea6..e17cc462f87 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -15,7 +15,7 @@
 #define TR2_MAX_THREAD_NAME (24)
 
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
+	const char *thread_name;
 	uint64_t *array_us_start;
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 6/7] trace2: add stopwatch timers
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-10-12 18:52   ` [PATCH v2 5/7] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
@ 2022-10-12 18:52   ` Jeff Hostetler via GitGitGadget
  2022-10-13 21:12     ` Junio C Hamano
  2022-10-12 18:52   ` [PATCH v2 7/7] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  7 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add stopwatch timer mechanism to Trace2.

Timers are an alternative to Trace2 Regions.  Regions are useful for
measuring the time spent in various computation phases, such as the
time to read the index, time to scan for unstaged files, time to scan
for untracked files, and etc.

However, regions are not appropriate in all places.  For example,
during a checkout, it would be very inefficient to use regions to
measure the total time spent inflating objects from the ODB from
across the entire lifetime of the process; a per-unzip() region would
flood the output and significantly slow the command; and some form of
post-processing would be requried to compute the time spent in unzip().

Timers can be used to measure a series of timer intervals and emit
a single summary event (at thread and/or process exit).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  90 ++++++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  98 +++++++++++++
 t/t0211-trace2-perf.sh                 |  49 +++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               |  75 ++++++++++
 trace2.h                               |  43 ++++++
 trace2/tr2_tgt.h                       |   7 +
 trace2/tr2_tgt_event.c                 |  26 ++++
 trace2/tr2_tgt_normal.c                |  23 ++++
 trace2/tr2_tgt_perf.c                  |  24 ++++
 trace2/tr2_tls.c                       |  10 ++
 trace2/tr2_tls.h                       |  10 ++
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 +++++++++++++++++++
 15 files changed, 784 insertions(+)
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 9d43909d068..75ce6f45603 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -769,6 +769,42 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running in the thread.  This event is generated when a thread
+	exits for timers that requested per-thread events.
++
+------------
+{
+	"event":"th_timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
+`"timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running aggregated across all threads.  This event is generated
+	when the process exits.
++
+------------
+{
+	"event":"timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
@@ -1200,6 +1236,60 @@ d0 | main                     | data         | r0  |  0.002126 |  0.002126 | fsy
 d0 | main                     | exit         |     |  0.000470 |           |              | code:0
 d0 | main                     | atexit       |     |  0.000477 |           |              | code:0
 ----------------
+
+Stopwatch Timer Events::
+
+	Measure the time spent in a function call or span of code
+	that might be called from many places within the code
+	throughout the life of the process.
++
+----------------
+static void expensive_function(void)
+{
+	trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+	...
+	sleep_millisec(1000); // Do something expensive
+	...
+	trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+}
+
+static int ut_100timer(int argc, const char **argv)
+{
+	...
+
+	expensive_function();
+
+	// Do something else 1...
+
+	expensive_function();
+
+	// Do something else 2...
+
+	expensive_function();
+
+	return 0;
+}
+----------------
++
+In this example, we measure the total time spent in
+`expensive_function()` regardless of when it is called
+in the overall flow of the program.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ t/helper/test-tool trace2 100timer 3 1000
+...
+$ cat ~/log.perf
+d0 | main                     | version      |     |           |           |              | ...
+d0 | main                     | start        |     |  0.001453 |           |              | t/helper/test-tool trace2 100timer 3 1000
+d0 | main                     | cmd_name     |     |           |           |              | trace2 (trace2)
+d0 | main                     | exit         |     |  3.003667 |           |              | code:0
+d0 | main                     | timer        |     |           |           | test         | name:test1 intervals:3 total:3.001686 min:1.000254 max:1.000929
+d0 | main                     | atexit       |     |  3.003796 |           |              | code:0
+----------------
+
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index cac3452edb9..820649bf62a 100644
--- a/Makefile
+++ b/Makefile
@@ -1102,6 +1102,7 @@ LIB_OBJS += trace2/tr2_tgt_event.o
 LIB_OBJS += trace2/tr2_tgt_normal.o
 LIB_OBJS += trace2/tr2_tgt_perf.o
 LIB_OBJS += trace2/tr2_tls.o
+LIB_OBJS += trace2/tr2_tmr.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport-helper.o
 LIB_OBJS += transport.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index a714130ece7..f951b9e97d7 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -228,6 +228,101 @@ static int ut_010bug_BUG(int argc, const char **argv)
 	BUG("a %s message", "BUG");
 }
 
+/*
+ * Single-threaded timer test.  Create several intervals using the
+ * TEST1 timer.  The test script can verify that an aggregate Trace2
+ * "timer" event is emitted indicating that we started+stopped the
+ * timer the requested number of times.
+ */
+static int ut_100timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay>";
+
+	int count = 0;
+	int delay = 0;
+	int k;
+
+	if (argc != 2)
+		die("%s", usage_error);
+	if (get_i(&count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&delay, argv[1]))
+		die("%s", usage_error);
+
+	for (k = 0; k < count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+		sleep_millisec(delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+	}
+
+	return 0;
+}
+
+struct ut_101_data {
+	int count;
+	int delay;
+};
+
+static void *ut_101timer_thread_proc(void *_ut_101_data)
+{
+	struct ut_101_data *data = _ut_101_data;
+	int k;
+
+	trace2_thread_start("ut_101");
+
+	for (k = 0; k < data->count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
+		sleep_millisec(data->delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
+	}
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+/*
+ * Multi-threaded timer test.  Create several threads that each create
+ * several intervals using the TEST2 timer.  The test script can verify
+ * that an individual Trace2 "th_timer" events for each thread and an
+ * aggregate "timer" event are generated.
+ */
+static int ut_101timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay> <threads>";
+
+	struct ut_101_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.delay, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_101timer_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -248,6 +343,9 @@ static struct unit_test ut_table[] = {
 	{ ut_008bug,      "008bug",    "" },
 	{ ut_009bug_BUG,  "009bug_BUG","" },
 	{ ut_010bug_BUG,  "010bug_BUG","" },
+
+	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
+	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 22d0845544e..5c28424e657 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -173,4 +173,53 @@ test_expect_success 'using global config, perf stream, return code 0' '
 	test_cmp expect actual
 '
 
+# Exercise the stopwatch timers in a loop and confirm that we have
+# as many start/stop intervals as expected.  We cannot really test the
+# actual (total, min, max) timer values, so we have to assume that they
+# are good, but we can verify the interval count.
+#
+# The timer "test/test1" should only emit a global summary "timer" event.
+# The timer "test/test2" should emit per-thread "th_timer" events and a
+# global summary "timer" event.
+
+have_timer_event () {
+	thread=$1 event=$2 category=$3 name=$4 intervals=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} intervals:${intervals}" &&
+
+	grep "${pattern}" ${file}
+}
+
+test_expect_success 'stopwatch timer test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test1" 5 times from "main".
+	test-tool trace2 100timer 5 10 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_timer_event "main" "timer" "test" "test1" 5 actual
+'
+
+test_expect_success 'stopwatch timer test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test2" 5 times each in 3 threads.
+	test-tool trace2 101timer 5 10 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_timer_event "th01:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th02:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th03:ut_101" "th_timer" "test" "test2" 5 actual &&
+
+	# And we should have 15 total uses.
+	have_timer_event "main" "timer" "test" "test2" 15 actual
+'
+
 test_done
diff --git a/t/t0211/scrub_perf.perl b/t/t0211/scrub_perf.perl
index 299999f0f89..7a50bae6463 100644
--- a/t/t0211/scrub_perf.perl
+++ b/t/t0211/scrub_perf.perl
@@ -64,6 +64,12 @@ while (<>) {
 	    goto SKIP_LINE;
 	}
     }
+    elsif ($tokens[$col_event] =~ m/timer/) {
+	# This also captures "th_timer" events
+	$tokens[$col_rest] =~ s/ total:\d+\.\d*/ total:_T_TOTAL_/;
+	$tokens[$col_rest] =~ s/ min:\d+\.\d*/ min:_T_MIN_/;
+	$tokens[$col_rest] =~ s/ max:\d+\.\d*/ max:_T_MAX_/;
+    }
 
     # t_abs and t_rel are either blank or a float.  Replace the float
     # with a constant for matching the HEREDOC in the test script.
diff --git a/trace2.c b/trace2.c
index 165264dc79a..a93cab7c2b7 100644
--- a/trace2.c
+++ b/trace2.c
@@ -13,6 +13,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static int trace2_enabled;
 
@@ -83,6 +84,23 @@ static void tr2_tgt_disable_builtins(void)
 		tgt_j->pfn_term();
 }
 
+/*
+ * The signature of this function must match the pfn_timer
+ * method in the targets.  (Think of this is an apply operation
+ * across the set of active targets.)
+ */
+static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
+				 const struct tr2_timer *timer,
+				 int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tgt_j->pfn_timer(meta, timer, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -110,6 +128,26 @@ static void tr2main_atexit_handler(void)
 	 */
 	tr2tls_pop_unwind_self();
 
+	/*
+	 * Some timers want per-thread details.  If the main thread
+	 * used one of those timers, emit the details now (before
+	 * we emit the aggregate timer values).
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data for the main thread to the final
+	 * totals.  And then emit the final timer values.
+	 *
+	 * Technically, we shouldn't need to hold the lock to update
+	 * and output the final_timer_block (since all other threads
+	 * should be dead by now), but it doesn't hurt anything.
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -541,6 +579,21 @@ void trace2_thread_exit_fl(const char *file, int line)
 	tr2tls_pop_unwind_self();
 	us_elapsed_thread = tr2tls_region_elasped_self(us_now);
 
+	/*
+	 * Some timers want per-thread details.  If this thread used
+	 * one of those timers, emit the details now.
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data from the current (non-main) thread
+	 * to the final totals.  (We'll accumulate data for the main
+	 * thread later during "atexit".)
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_exit_fl)
 			tgt_j->pfn_thread_exit_fl(file, line,
@@ -795,6 +848,28 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...)
 	va_end(ap);
 }
 
+void trace2_timer_start(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_start: invalid timer id: %d", tid);
+
+	tr2_start_timer(tid);
+}
+
+void trace2_timer_stop(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_stop: invalid timer id: %d", tid);
+
+	tr2_stop_timer(tid);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index 74cdb1354f7..7a843ac0518 100644
--- a/trace2.h
+++ b/trace2.h
@@ -51,6 +51,7 @@ struct json_writer;
  * [] trace2_region*    -- emit region nesting messages.
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
+ * [] trace2_timer*     -- stopwatch timers (messages are deferred).
  */
 
 /*
@@ -485,6 +486,48 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...);
 
 #define trace2_printf(...) trace2_printf_fl(__FILE__, __LINE__, __VA_ARGS__)
 
+/*
+ * Define the set of stopwatch timers.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum must also be added to the
+ * `tr2_timer_metadata[]` in `trace2/tr2_tmr.c`.
+ */
+enum trace2_timer_id {
+	/*
+	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional timer definitions before here. */
+	TRACE2_NUMBER_OF_TIMERS
+};
+
+/*
+ * Start/Stop the indicated stopwatch timer in the current thread.
+ *
+ * The time spent by the current thread between the _start and _stop
+ * calls will be added to the thread's partial sum for this timer.
+ *
+ * Timer events are emitted at thread and program exit.
+ *
+ * Note: Since the stopwatch API routines do not generate individual
+ * events, they do not take (file, line) arguments.  Similarly, the
+ * category and timer name values are defined at compile-time in the
+ * timer definitions array, so they are not needed here in the API.
+ */
+void trace2_timer_start(enum trace2_timer_id tid);
+void trace2_timer_stop(enum trace2_timer_id tid);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 65f94e15748..2a80bef0df5 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -4,6 +4,8 @@
 struct child_process;
 struct repository;
 struct json_writer;
+struct tr2_timer_metadata;
+struct tr2_timer;
 
 /*
  * Function prototypes for a TRACE2 "target" vtable.
@@ -96,6 +98,10 @@ typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int line,
 					 uint64_t us_elapsed_absolute,
 					 const char *fmt, va_list ap);
 
+typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
+				  const struct tr2_timer *timer,
+				  int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -132,6 +138,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_fl_t                   *pfn_data_fl;
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
+	tr2_tgt_evt_timer_t                     *pfn_timer;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 52f9356c695..1196da89ba4 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -9,6 +9,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_event = {
 	.sysenv_var = TR2_SYSENV_EVENT,
@@ -617,6 +618,30 @@ static void fn_data_json_fl(const char *file, int line,
 	}
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_total = ((double)timer->total_ns) / 1000000000.0;
+	double t_min = ((double)timer->min_ns) / 1000000000.0;
+	double t_max = ((double)timer->max_ns) / 1000000000.0;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "intervals", timer->interval_count);
+	jw_object_double(&jw, "t_total", 6, t_total);
+	jw_object_double(&jw, "t_min", 6, t_min);
+	jw_object_double(&jw, "t_max", 6, t_max);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -648,4 +673,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 69f80330778..3888c10ef50 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -8,6 +8,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_normal = {
 	.sysenv_var = TR2_SYSENV_NORMAL,
@@ -329,6 +330,27 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = ((double)timer->total_ns) / 1000000000.0;
+	double t_min = ((double)timer->min_ns) / 1000000000.0;
+	double t_max = ((double)timer->max_ns) / 1000000000.0;
+
+	strbuf_addf(&buf_payload, ("%s %s/%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    event_name, meta->category, meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -360,4 +382,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_fl = NULL,
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 59ca58f862d..89b30ddc0e4 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -10,6 +10,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_perf = {
 	.sysenv_var = TR2_SYSENV_PERF,
@@ -555,6 +556,28 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = ((double)timer->total_ns) / 1000000000.0;
+	double t_min = ((double)timer->min_ns) / 1000000000.0;
+	double t_max = ((double)timer->max_ns) / 1000000000.0;
+
+	strbuf_addf(&buf_payload, ("name:%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -586,4 +609,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 3a67532aae4..04900bb4c3a 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -181,3 +181,13 @@ int tr2tls_locked_increment(int *p)
 
 	return current_value;
 }
+
+void tr2tls_lock(void)
+{
+	pthread_mutex_lock(&tr2tls_mutex);
+}
+
+void tr2tls_unlock(void)
+{
+	pthread_mutex_unlock(&tr2tls_mutex);
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index e17cc462f87..2322b0d0ef0 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_tmr.h"
 
 /*
  * Notice: the term "TLS" refers to "thread-local storage" in the
@@ -20,6 +21,9 @@ struct tr2tls_thread_ctx {
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+	struct tr2_timer_block timer_block;
+	unsigned int used_any_timer:1;
+	unsigned int used_any_per_thread_timer:1;
 };
 
 /*
@@ -107,4 +111,10 @@ int tr2tls_locked_increment(int *p);
  */
 void tr2tls_start_process_clock(void);
 
+/*
+ * Explicitly lock/unlock our mutex.
+ */
+void tr2tls_lock(void);
+void tr2tls_unlock(void);
+
 #endif /* TR2_TLS_H */
diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c
new file mode 100644
index 00000000000..786762dfd26
--- /dev/null
+++ b/trace2/tr2_tmr.c
@@ -0,0 +1,182 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
+
+#define MY_MAX(a, b) ((a) > (b) ? (a) : (b))
+#define MY_MIN(a, b) ((a) < (b) ? (a) : (b))
+
+/*
+ * A global timer block to aggregate values from the partial sums from
+ * each thread.
+ */
+static struct tr2_timer_block final_timer_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each stopwatch timer.
+ *
+ * This array must match "enum trace2_timer_id" and the values
+ * in "struct tr2_timer_block.timer[*]".
+ */
+static struct tr2_timer_metadata tr2_timer_metadata[TRACE2_NUMBER_OF_TIMERS] = {
+	[TRACE2_TIMER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_TIMER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_start_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+	t->recursion_count++;
+	if (t->recursion_count > 1)
+		return; /* ignore recursive starts */
+
+	t->start_ns = getnanotime();
+}
+
+void tr2_stop_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+	uint64_t ns_now;
+	uint64_t ns_interval;
+
+	assert(t->recursion_count > 0);
+
+	t->recursion_count--;
+	if (t->recursion_count)
+		return; /* still in recursive call(s) */
+
+	ns_now = getnanotime();
+	ns_interval = ns_now - t->start_ns;
+
+	t->total_ns += ns_interval;
+
+	/*
+	 * min_ns was initialized to zero (in the xcalloc()) rather
+	 * than UINT_MAX when the block of timers was allocated,
+	 * so we should always set both the min_ns and max_ns values
+	 * the first time that the timer is used.
+	 */
+	if (!t->interval_count) {
+		t->min_ns = ns_interval;
+		t->max_ns = ns_interval;
+	} else {
+		t->min_ns = MY_MIN(ns_interval, t->min_ns);
+		t->max_ns = MY_MAX(ns_interval, t->max_ns);
+	}
+
+	t->interval_count++;
+
+	ctx->used_any_timer = 1;
+	if (tr2_timer_metadata[tid].want_per_thread_events)
+		ctx->used_any_per_thread_timer = 1;
+}
+
+void tr2_update_final_timers(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_timer)
+		return;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		struct tr2_timer *t_final = &final_timer_block.timer[tid];
+		struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+		if (t->recursion_count) {
+			/*
+			 * The current thread is exiting with
+			 * timer[tid] still running.
+			 *
+			 * Technically, this is a bug, but I'm going
+			 * to ignore it.
+			 *
+			 * I don't think it is worth calling die()
+			 * for.  I don't think it is worth killing the
+			 * process for this bookkeeping error.  We
+			 * might want to call warning(), but I'm going
+			 * to wait on that.
+			 *
+			 * The downside here is that total_ns won't
+			 * include the current open interval (now -
+			 * start_ns).  I can live with that.
+			 */
+		}
+
+		if (!t->interval_count)
+			continue; /* this timer was not used by this thread */
+
+		t_final->total_ns += t->total_ns;
+
+		/*
+		 * final_timer_block.timer[tid].min_ns was initialized to
+		 * was initialized to zero rather than UINT_MAX, so we should
+		 * always set both the min_ns and max_ns values the first time
+		 * that we add a partial sum into it.
+		 */
+		if (!t_final->interval_count) {
+			t_final->min_ns = t->min_ns;
+			t_final->max_ns = t->max_ns;
+		} else {
+			t_final->min_ns = MY_MIN(t_final->min_ns, t->min_ns);
+			t_final->max_ns = MY_MAX(t_final->max_ns, t->max_ns);
+		}
+
+		t_final->interval_count += t->interval_count;
+	}
+}
+
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_per_thread_timer)
+		return;
+
+	/*
+	 * For each timer, if the timer wants per-thread events and
+	 * this thread used it, emit it.
+	 */
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (tr2_timer_metadata[tid].want_per_thread_events &&
+		    ctx->timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &ctx->timer_block.timer[tid],
+				 0);
+}
+
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	enum trace2_timer_id tid;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (final_timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &final_timer_block.timer[tid],
+				 1);
+}
diff --git a/trace2/tr2_tmr.h b/trace2/tr2_tmr.h
new file mode 100644
index 00000000000..d5753576134
--- /dev/null
+++ b/trace2/tr2_tmr.h
@@ -0,0 +1,140 @@
+#ifndef TR2_TMR_H
+#define TR2_TMR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow "stopwatch" timers.
+ *
+ * Timers can be used to measure "interesting" activity that does not
+ * fit the "region" model, such as code called from many different
+ * regions (like zlib) and/or where data for individual calls are not
+ * interesting or are too numerous to be efficiently logged.
+ *
+ * Timer values are accumulated during program execution and emitted
+ * to the Trace2 logs at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * timers and timer ids using a "timer block" array in thread-local
+ * storage.  This gives us constant time access to each timer within
+ * each thread, since we want start/stop operations to be as fast as
+ * possible.  This lets us avoid the complexities of dynamically
+ * allocating a timer on the first use by a thread and/or possibly
+ * sharing that timer definition with other concurrent threads.
+ * However, this does require that we define time the set of timers at
+ * compile time.
+ *
+ * Each thread uses the timer block in its thread-local storage to
+ * compute partial sums for each timer (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Using this "timer block" model costs ~48 bytes per timer per thread
+ * (we have about six uint64 fields per timer).  This does increase
+ * the size of the thread-local storage block, but it is allocated (at
+ * thread create time) and not on the thread stack, so I'm not worried
+ * about the size.
+ *
+ * Partial sums for each timer are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each timer are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "timer metadata" table contains the "category" and "name"
+ * fields for each timer.  This eliminates the need to include those
+ * args in the various timer APIs.
+ */
+
+/*
+ * The definition of an individual timer and used by an individual
+ * thread.
+ */
+struct tr2_timer {
+	/*
+	 * Total elapsed time for this timer in this thread in nanoseconds.
+	 */
+	uint64_t total_ns;
+
+	/*
+	 * The maximum and minimum interval values observed for this
+	 * timer in this thread.
+	 */
+	uint64_t min_ns;
+	uint64_t max_ns;
+
+	/*
+	 * The value of the clock when this timer was started in this
+	 * thread.  (Undefined when the timer is not active in this
+	 * thread.)
+	 */
+	uint64_t start_ns;
+
+	/*
+	 * Number of times that this timer has been started and stopped
+	 * in this thread.  (Recursive starts are ignored.)
+	 */
+	uint64_t interval_count;
+
+	/*
+	 * Number of nested starts on the stack in this thread.  (We
+	 * ignore recursive starts and use this to track the recursive
+	 * calls.)
+	 */
+	unsigned int recursion_count;
+};
+
+/*
+ * Metadata for a timer.
+ */
+struct tr2_timer_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this timer
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed-size block of timers to insert into
+ * thread-local storage.  This wrapper is used to avoid quirks
+ * of C and the usual need to pass an array size argument.
+ */
+struct tr2_timer_block {
+	struct tr2_timer timer[TRACE2_NUMBER_OF_TIMERS];
+};
+
+/*
+ * Private routines used by trace2.c to actually start/stop an
+ * individual timer in the current thread.
+ */
+void tr2_start_timer(enum trace2_timer_id tid);
+void tr2_stop_timer(enum trace2_timer_id tid);
+
+/*
+ * Add the current thread's timer data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_timers(void);
+
+/*
+ * Emit per-thread timer data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+/*
+ * Emit global total timer values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+#endif /* TR2_TMR_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 7/7] trace2: add global counter mechanism
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-10-12 18:52   ` [PATCH v2 6/7] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2022-10-12 18:52   ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  7 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-12 18:52 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add global counters mechanism to Trace2.

The Trace2 counters mechanism adds the ability to create a set of
global counter variables and an API to increment them efficiently.
Counters can optionally report per-thread usage in addition to the sum
across all threads.

Counter events are emitted to the Trace2 logs when a thread exits and
at process exit.

Counters are an alternative to `data` and `data_json` events.

Counters are useful when you want to measure something across the life
of the process, when you don't want per-measurement events for
performance reasons, when the data does not fit conveniently within a
region, or when your control flow does not easily let you write the
final total.  For example, you might use this to report the number of
calls to unzip() or the number of de-delta steps during a checkout.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  31 ++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  89 +++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  46 +++++++++++
 trace2.c                               |  52 +++++++++++--
 trace2.h                               |  37 +++++++++
 trace2/tr2_ctr.c                       | 101 ++++++++++++++++++++++++
 trace2/tr2_ctr.h                       | 104 +++++++++++++++++++++++++
 trace2/tr2_tgt.h                       |   7 ++
 trace2/tr2_tgt_event.c                 |  19 +++++
 trace2/tr2_tgt_normal.c                |  16 ++++
 trace2/tr2_tgt_perf.c                  |  17 ++++
 trace2/tr2_tls.h                       |   4 +
 13 files changed, 517 insertions(+), 7 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 75ce6f45603..de5fc250595 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -805,6 +805,37 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_counter"`::
+	This event logs the value of a counter variable in a thread.
+	This event is generated when a thread exits for counters that
+	requested per-thread events.
++
+------------
+{
+	"event":"th_counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+`"counter"`::
+	This event logs the value of a counter variable across all threads.
+	This event is generated when the process exits.  The total value
+	reported here is the sum across all threads.
++
+------------
+{
+	"event":"counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/Makefile b/Makefile
index 820649bf62a..29ab417ca3a 100644
--- a/Makefile
+++ b/Makefile
@@ -1094,6 +1094,7 @@ LIB_OBJS += trace.o
 LIB_OBJS += trace2.o
 LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
+LIB_OBJS += trace2/tr2_ctr.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
 LIB_OBJS += trace2/tr2_sysenv.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f951b9e97d7..1b092c60714 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -323,6 +323,92 @@ static int ut_101timer(int argc, const char **argv)
 	return 0;
 }
 
+/*
+ * Single-threaded counter test.  Add several values to the TEST1 counter.
+ * The test script can verify that the final sum is reported in the "counter"
+ * event.
+ */
+static int ut_200counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> [<v2> [...]]";
+	int value;
+	int k;
+
+	if (argc < 1)
+		die("%s", usage_error);
+
+	for (k = 0; k < argc; k++) {
+		if (get_i(&value, argv[k]))
+			die("invalid value[%s] -- %s",
+			    argv[k], usage_error);
+		trace2_counter_add(TRACE2_COUNTER_ID_TEST1, value);
+	}
+
+	return 0;
+}
+
+/*
+ * Multi-threaded counter test.  Create seveal threads that each increment
+ * the TEST2 global counter.  The test script can verify that an individual
+ * "th_counter" event is generated with a partial sum for each thread and
+ * that a final aggregate "counter" event is generated.
+ */
+
+struct ut_201_data {
+	int v1;
+	int v2;
+};
+
+static void *ut_201counter_thread_proc(void *_ut_201_data)
+{
+	struct ut_201_data *data = _ut_201_data;
+
+	trace2_thread_start("ut_201");
+
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v1);
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v2);
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+static int ut_201counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> <v2> <threads>";
+
+	struct ut_201_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.v1, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.v2, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_201counter_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -346,6 +432,9 @@ static struct unit_test ut_table[] = {
 
 	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
 	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
+
+	{ ut_200counter,  "200counter", "<v1> [<v2> [<v3> [...]]]" },
+	{ ut_201counter,  "201counter", "<v1> <v2> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 5c28424e657..0b3436e8cac 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -222,4 +222,50 @@ test_expect_success 'stopwatch timer test/test2' '
 	have_timer_event "main" "timer" "test" "test2" 15 actual
 '
 
+# Exercise the global counters and confirm that we get the expected values.
+#
+# The counter "test/test1" should only emit a global summary "counter" event.
+# The counter "test/test2" could emit per-thread "th_counter" events and a
+# global summary "counter" event.
+
+have_counter_event () {
+	thread=$1 event=$2 category=$3 name=$4 value=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} value:${value}" &&
+
+	grep "${patern}" ${file}
+}
+
+test_expect_success 'global counter test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the counter "test1" and add n integers.
+	test-tool trace2 200counter 1 2 3 4 5 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_counter_event "main" "counter" "test" "test1" 15 actual
+'
+
+test_expect_success 'global counter test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Add 2 integers to the counter "test2" in each of 3 threads.
+	test-tool trace2 201counter 7 13 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_counter_event "th01:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th02:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th03:ut_201" "th_counter" "test" "test2" 20 actual &&
+
+	# And we should have a single event with the total across all threads.
+	have_counter_event "main" "counter" "test" "test2" 60 actual
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index a93cab7c2b7..279bddf53b4 100644
--- a/trace2.c
+++ b/trace2.c
@@ -8,6 +8,7 @@
 #include "version.h"
 #include "trace2/tr2_cfg.h"
 #include "trace2/tr2_cmd_name.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
 #include "trace2/tr2_sysenv.h"
@@ -101,6 +102,22 @@ static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
 			tgt_j->pfn_timer(meta, timer, is_final_data);
 }
 
+/*
+ * The signature of this function must match the pfn_counter
+ * method in the targets.
+ */
+static void tr2_tgt_emit_a_counter(const struct tr2_counter_metadata *meta,
+				   const struct tr2_counter *counter,
+				   int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tgt_j->pfn_counter(meta, counter, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -132,20 +149,26 @@ static void tr2main_atexit_handler(void)
 	 * Some timers want per-thread details.  If the main thread
 	 * used one of those timers, emit the details now (before
 	 * we emit the aggregate timer values).
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data for the main thread to the final
-	 * totals.  And then emit the final timer values.
+	 * Add stopwatch timer and counter data for the main thread to
+	 * the final totals.  And then emit the final values.
 	 *
 	 * Technically, we shouldn't need to hold the lock to update
-	 * and output the final_timer_block (since all other threads
-	 * should be dead by now), but it doesn't hurt anything.
+	 * and output the final_timer_block and final_counter_block
+	 * (since all other threads should be dead by now), but it
+	 * doesn't hurt anything.
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_final_counters(tr2_tgt_emit_a_counter);
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -582,16 +605,20 @@ void trace2_thread_exit_fl(const char *file, int line)
 	/*
 	 * Some timers want per-thread details.  If this thread used
 	 * one of those timers, emit the details now.
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data from the current (non-main) thread
-	 * to the final totals.  (We'll accumulate data for the main
-	 * thread later during "atexit".)
+	 * Add stopwatch timer and counter data from the current
+	 * (non-main) thread to the final totals.  (We'll accumulate
+	 * data for the main thread later during "atexit".)
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -870,6 +897,17 @@ void trace2_timer_stop(enum trace2_timer_id tid)
 	tr2_stop_timer(tid);
 }
 
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS)
+		BUG("trace2_counter_add: invalid counter id: %d", cid);
+
+	tr2_counter_increment(cid, value);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index 7a843ac0518..4ced30c0db3 100644
--- a/trace2.h
+++ b/trace2.h
@@ -52,6 +52,7 @@ struct json_writer;
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
  * [] trace2_timer*     -- stopwatch timers (messages are deferred).
+ * [] trace2_counter*   -- global counters (messages are deferred).
  */
 
 /*
@@ -528,6 +529,42 @@ enum trace2_timer_id {
 void trace2_timer_start(enum trace2_timer_id tid);
 void trace2_timer_stop(enum trace2_timer_id tid);
 
+/*
+ * Define the set of global counters.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum be also be added to the
+ * `tr2_counter_metadata[]` in `trace2/tr2_tr2_ctr.c`.
+ */
+enum trace2_counter_id {
+	/*
+	 * Define two counters for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional counter definitions before here. */
+	TRACE2_NUMBER_OF_COUNTERS
+};
+
+/*
+ * Increase the named global counter by value.
+ *
+ * Note that this adds `value` to the current thread's partial sum for
+ * this counter (without locking) and that the complete sum is not
+ * available until all threads have exited, so it does not return the
+ * new value of the counter.
+ */
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
new file mode 100644
index 00000000000..483ca7c308f
--- /dev/null
+++ b/trace2/tr2_ctr.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_ctr.h"
+
+/*
+ * A global counter block to aggregrate values from the partial sums
+ * from each thread.
+ */
+static struct tr2_counter_block final_counter_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each global counter.
+ *
+ * This array must match the "enum trace2_counter_id" and the values
+ * in "struct tr2_counter_block.counter[*]".
+ */
+static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTERS] = {
+	[TRACE2_COUNTER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_COUNTER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+	c->value += value;
+
+	ctx->used_any_counter = 1;
+	if (tr2_counter_metadata[cid].want_per_thread_events)
+		ctx->used_any_per_thread_counter = 1;
+}
+
+void tr2_update_final_counters(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_counter)
+		return;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		struct tr2_counter *c_final = &final_counter_block.counter[cid];
+		const struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+		c_final->value += c->value;
+	}
+}
+
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_per_thread_counter)
+		return;
+
+	/*
+	 * For each counter, if the counter wants per-thread events
+	 * and this thread used it (the value is non-zero), emit it.
+	 */
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (tr2_counter_metadata[cid].want_per_thread_events &&
+		    ctx->counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &ctx->counter_block.counter[cid],
+				 0);
+}
+
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	enum trace2_counter_id cid;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (final_counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &final_counter_block.counter[cid],
+				 1);
+}
diff --git a/trace2/tr2_ctr.h b/trace2/tr2_ctr.h
new file mode 100644
index 00000000000..a2267ee9901
--- /dev/null
+++ b/trace2/tr2_ctr.h
@@ -0,0 +1,104 @@
+#ifndef TR2_CTR_H
+#define TR2_CTR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow global "counters".
+ *
+ * Counters can be used count interesting activity that does not fit
+ * the "region and data" model, such as code called from many
+ * different regions and/or where you want to count a number of items,
+ * but don't have control of when the last item will be processed,
+ * such as counter the number of calls to `lstat()`.
+ *
+ * Counters differ from Trace2 "data" events.  Data events are emitted
+ * immediately and are appropriate for documenting loop counters at
+ * the end of a region, for example.  Counter values are accumulated
+ * during the program and final counter values are emitted at program
+ * exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * counters and counter ids using a fixed size "counter block" array
+ * in thread-local storage.  This gives us constant time, lock-free
+ * access to each counter within each thread.  This lets us avoid the
+ * complexities of dynamically allocating a counter and sharing that
+ * definition with other threads.
+ *
+ * Each thread uses the counter block in its thread-local storage to
+ * increment partial sums for each counter (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Partial sums for each counter are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each counter are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "counter metadata" table contains the "category" and
+ * "name" fields for each counter.  This eliminates the need to
+ * include those args in the various counter APIs.
+ */
+
+/*
+ * The definition of an individual counter as used by an individual
+ * thread (and later in aggregation).
+ */
+struct tr2_counter {
+	uint64_t value;
+};
+
+/*
+ * Metadata for a counter.
+ */
+struct tr2_counter_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this counter
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed block of counters to insert into thread-local
+ * storage.  This wrapper is used to avoid quirks of C and the usual
+ * need to pass an array size argument.
+ */
+struct tr2_counter_block {
+	struct tr2_counter counter[TRACE2_NUMBER_OF_COUNTERS];
+};
+
+/*
+ * Private routines used by trace2.c to increment a counter for the
+ * current thread.
+ */
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value);
+
+/*
+ * Add the current thread's counter data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_counters(void);
+
+/*
+ * Emit per-thread counter data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+/*
+ * Emit global counter values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+#endif /* TR2_CTR_H */
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 2a80bef0df5..94a334d980a 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -6,6 +6,8 @@ struct repository;
 struct json_writer;
 struct tr2_timer_metadata;
 struct tr2_timer;
+struct tr2_counter_metadata;
+struct tr2_counter;
 
 /*
  * Function prototypes for a TRACE2 "target" vtable.
@@ -102,6 +104,10 @@ typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
 				  const struct tr2_timer *timer,
 				  int is_final_data);
 
+typedef void(tr2_tgt_evt_counter_t)(const struct tr2_counter_metadata *meta,
+				    const struct tr2_counter *counter,
+				    int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -139,6 +145,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
 	tr2_tgt_evt_timer_t                     *pfn_timer;
+	tr2_tgt_evt_counter_t                   *pfn_counter;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 1196da89ba4..bb0653e0e6f 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -642,6 +642,24 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	jw_release(&jw);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "count", counter->value);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -674,4 +692,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 3888c10ef50..b21508e06f7 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -351,6 +351,21 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "%s %s/%s value:%"PRIu64,
+		    event_name, meta->category, meta->name,
+		    counter->value);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -383,4 +398,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 89b30ddc0e4..188068ac5d0 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -578,6 +578,22 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "name:%s value:%"PRIu64,
+		    meta->name,
+		    counter->value);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -610,4 +626,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 2322b0d0ef0..289b62d0721 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tmr.h"
 
 /*
@@ -22,8 +23,11 @@ struct tr2tls_thread_ctx {
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 	struct tr2_timer_block timer_block;
+	struct tr2_counter_block counter_block;
 	unsigned int used_any_timer:1;
 	unsigned int used_any_per_thread_timer:1;
+	unsigned int used_any_counter:1;
+	unsigned int used_any_per_thread_counter:1;
 };
 
 /*
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start
  2022-10-12 18:52   ` [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
@ 2022-10-12 21:06     ` Ævar Arnfjörð Bjarmason
  2022-10-20 14:40       ` Jeff Hostetler
  2022-10-13 21:12     ` Junio C Hamano
  1 sibling, 1 reply; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-12 21:06 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Derrick Stolee, Jeff Hostetler


On Wed, Oct 12 2022, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Rename the `thread_name` argument in `tr2tls_create_self()` and
> `trace2_thread_start()` to be `thread_base_name` to make it clearer
> that the passed argument is a component used in the construction of
> the actual `struct tr2tls_thread_ctx.thread_name` variable.
>
> The base name will be used along with the thread id to create a
> unique thread name.

Makes sense.

> This commit does not change how the `thread_name` field is
> allocated or stored within the `tr2tls_thread_ctx` structure.

What this commit does change though, which isn't mentioned here, is...

> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index 1297509fd23..7d1f03a2ea6 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -25,17 +25,20 @@ struct tr2tls_thread_ctx {
>  /*
>   * Create thread-local storage for the current thread.
>   *
> - * We assume the first thread is "main".  Other threads are given
> - * non-zero thread-ids to help distinguish messages from concurrent
> - * threads.
> - *
> - * Truncate the thread name if necessary to help with column alignment
> - * in printf-style messages.
> + * The first thread in the process will have:
> + *     { .thread_id=0, .thread_name="main" }
> + * Subsequent threads are given a non-zero thread_id and a thread_name
> + * constructed from the id and a thread base name (which is usually just
> + * the name of the thread-proc function).  For example:
> + *     { .thread_id=10, .thread_name="th10fsm-listen" }
> + * This helps to identify and distinguish messages from concurrent threads.
> + * The ctx.thread_name field is truncated if necessary to help with column
> + * alignment in printf-style messages.

...this documentation, which I'd argue should be a separate change, as
nothing's changed about the state of the world with this rename of the
field, this was all true before this rename.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 6/7] trace2: add stopwatch timers
  2022-10-12 18:52   ` [PATCH v2 6/7] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2022-10-13 21:12     ` Junio C Hamano
  2022-10-20 14:42       ` Jeff Hostetler
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2022-10-13 21:12 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Add stopwatch timer mechanism to Trace2.
>
>  trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++++
>  trace2/tr2_tmr.h                       | 140 +++++++++++++++++++
>  15 files changed, 784 insertions(+)
>  create mode 100644 trace2/tr2_tmr.c
>  create mode 100644 trace2/tr2_tmr.h

Whew.  That's a lot of new code and doc to make two calls to
getnanotime() and accumulate the differences.

It was irritating to count zeros in the same constant 1000000000.0
spelled out 9 times.  Perhaps something like

#define NS_TO_SECONDS(ns) ((double)(ns) / (1000*1000*1000.))

would have helped?

Other than that, all looked reasonable.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 5/7] trace2: convert ctx.thread_name from strbuf to pointer
  2022-10-12 18:52   ` [PATCH v2 5/7] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
@ 2022-10-13 21:12     ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2022-10-13 21:12 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
> to a "const char*" pointer.
>
> The `thread_name` field is a constant string that is constructed when
> the context is created.  Using a (non-const) `strbuf` structure for it
> caused some confusion in the past because it implied that someone
> could rename a thread after it was created.  That usage was not
> intended.  Change it to a const pointer to make the intent more clear.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tgt_event.c |  2 +-
>  trace2/tr2_tgt_perf.c  |  2 +-
>  trace2/tr2_tls.c       | 16 +++++++++-------
>  trace2/tr2_tls.h       |  2 +-
>  4 files changed, 12 insertions(+), 10 deletions(-)

Looking good so far.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start
  2022-10-12 18:52   ` [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
  2022-10-12 21:06     ` Ævar Arnfjörð Bjarmason
@ 2022-10-13 21:12     ` Junio C Hamano
  1 sibling, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2022-10-13 21:12 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Rename the `thread_name` argument in `tr2tls_create_self()` and
> `trace2_thread_start()` to be `thread_base_name` to make it clearer
> that the passed argument is a component used in the construction of
> the actual `struct tr2tls_thread_ctx.thread_name` variable.
>
> The base name will be used along with the thread id to create a
> unique thread name.
> ...
> -struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
> +struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
>  					     uint64_t us_thread_start)
>  {
>  	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> @@ -50,7 +50,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  	strbuf_init(&ctx->thread_name, 0);
>  	if (ctx->thread_id)
>  		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
> -	strbuf_addstr(&ctx->thread_name, thread_name);
> +	strbuf_addstr(&ctx->thread_name, thread_base_name);
>  	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>  		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);

This hunk is very illustrative and highlights the difference between
thread_base_name parameter and .thread_name member in the context.

Good.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 2/7] tr2tls: clarify TLS terminology
  2022-10-12 18:52   ` [PATCH v2 2/7] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
@ 2022-10-13 21:12     ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2022-10-13 21:12 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  `"thread_start"`::
>  	This event is generated when a thread is started.  It is
> -	generated from *within* the new thread's thread-proc (for TLS
> -	reasons).
> +	generated from *within* the new thread's thread-proc (because
> +	it needs to access data in the thread's thread-local storage).

This is a vast improvement, not just "TLS" -> "thread-local strage",
but the original "for TLS reasons" would not be understood by anybody
who does not already know.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start
  2022-10-12 21:06     ` Ævar Arnfjörð Bjarmason
@ 2022-10-20 14:40       ` Jeff Hostetler
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-20 14:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Jeff Hostetler via GitGitGadget
  Cc: git, Derrick Stolee, Jeff Hostetler



On 10/12/22 5:06 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 12 2022, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Rename the `thread_name` argument in `tr2tls_create_self()` and
>> `trace2_thread_start()` to be `thread_base_name` to make it clearer
>> that the passed argument is a component used in the construction of
>> the actual `struct tr2tls_thread_ctx.thread_name` variable.
>>
>> The base name will be used along with the thread id to create a
>> unique thread name.
> 
> Makes sense.
> 
>> This commit does not change how the `thread_name` field is
>> allocated or stored within the `tr2tls_thread_ctx` structure.
> 
> What this commit does change though, which isn't mentioned here, is...
> 
>> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
>> index 1297509fd23..7d1f03a2ea6 100644
>> --- a/trace2/tr2_tls.h
>> +++ b/trace2/tr2_tls.h
>> @@ -25,17 +25,20 @@ struct tr2tls_thread_ctx {
>>   /*
>>    * Create thread-local storage for the current thread.
>>    *
>> - * We assume the first thread is "main".  Other threads are given
>> - * non-zero thread-ids to help distinguish messages from concurrent
>> - * threads.
>> - *
>> - * Truncate the thread name if necessary to help with column alignment
>> - * in printf-style messages.
>> + * The first thread in the process will have:
>> + *     { .thread_id=0, .thread_name="main" }
>> + * Subsequent threads are given a non-zero thread_id and a thread_name
>> + * constructed from the id and a thread base name (which is usually just
>> + * the name of the thread-proc function).  For example:
>> + *     { .thread_id=10, .thread_name="th10fsm-listen" }
>> + * This helps to identify and distinguish messages from concurrent threads.
>> + * The ctx.thread_name field is truncated if necessary to help with column
>> + * alignment in printf-style messages.
> 
> ...this documentation, which I'd argue should be a separate change, as
> nothing's changed about the state of the world with this rename of the
> field, this was all true before this rename.
> 

good point.  i'll split it and resend.
thanks
Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 6/7] trace2: add stopwatch timers
  2022-10-13 21:12     ` Junio C Hamano
@ 2022-10-20 14:42       ` Jeff Hostetler
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-20 14:42 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Jeff Hostetler



On 10/13/22 5:12 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Add stopwatch timer mechanism to Trace2.
[...]
> It was irritating to count zeros in the same constant 1000000000.0
> spelled out 9 times.  Perhaps something like
> 
> #define NS_TO_SECONDS(ns) ((double)(ns) / (1000*1000*1000.))
> 
> would have helped?

good point.  i'll resend.

thanks
jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 0/8] Trace2 timers and counters and some cleanup
  2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
                     ` (6 preceding siblings ...)
  2022-10-12 18:52   ` [PATCH v2 7/7] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28   ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
                       ` (8 more replies)
  7 siblings, 9 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

Here is version 2 of this series to add timers and counters to Trace2.

Changes since V1:

 * I dropped the commits concerning compiler errors in Clang 11.0.0 on
   MacOS. I've sent them to the mailing list in a separate series, since
   they had nothing to do with the main topic of this series.

 * I moved the documentation changes earlier in the series to get it out of
   the way (and eliminate the need to update it later commits).

 * After a long conversation on the mailing list, I redid the two
   thread-name commits to simplify and hopefully eliminate the remaining
   misunderstandings and/or short-comings of my previous attempt and
   explanations. We now use a "const char *" for the field in the thread-ctx
   that we format and detach from a strbuf during thread-start. The goal
   here is to move away from a modifyable strbuf in the thread-ctx itself
   (to avoid giving the appearance that a caller could modify the
   thread-name at some point, when that was not intended).

The last 2 commits add the stopwatch timers and the global counters and are
unchanged from the previous version.

Jeff Hostetler (8):
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  tr2tls: clarify TLS terminology
  api-trace2.txt: elminate section describing the public trace2 API
  trace2: rename the thread_name argument to trace2_thread_start
  trace2: improve thread-name documentation in the thread-context
  trace2: convert ctx.thread_name from strbuf to pointer
  trace2: add stopwatch timers
  trace2: add global counter mechanism

 Documentation/technical/api-trace2.txt | 190 +++++++++++++++++--------
 Makefile                               |   2 +
 t/helper/test-trace2.c                 | 187 ++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  95 +++++++++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               | 121 +++++++++++++++-
 trace2.h                               | 101 +++++++++++--
 trace2/tr2_ctr.c                       | 101 +++++++++++++
 trace2/tr2_ctr.h                       | 104 ++++++++++++++
 trace2/tr2_tgt.h                       |  16 +++
 trace2/tr2_tgt_event.c                 |  47 +++++-
 trace2/tr2_tgt_normal.c                |  39 +++++
 trace2/tr2_tgt_perf.c                  |  43 +++++-
 trace2/tr2_tls.c                       |  34 +++--
 trace2/tr2_tls.h                       |  55 ++++---
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 ++++++++++++++++++
 17 files changed, 1361 insertions(+), 102 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h


base-commit: 3dcec76d9df911ed8321007b1d197c1a206dc164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1373%2Fjeffhostetler%2Ftrace2-stopwatch-v4-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1373/jeffhostetler/trace2-stopwatch-v4-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1373

Range-diff vs v2:

 1:  6e7e4f3187e = 1:  6e7e4f3187e trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
 2:  9dee7a75903 = 2:  9dee7a75903 tr2tls: clarify TLS terminology
 3:  804dab9e1a7 = 3:  804dab9e1a7 api-trace2.txt: elminate section describing the public trace2 API
 4:  637b422b860 ! 4:  9adf9cee1a9 trace2: rename the thread_name argument to trace2_thread_start
     @@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *threa
      
       ## trace2/tr2_tls.h ##
      @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
     - /*
     -  * Create thread-local storage for the current thread.
     -  *
     -- * We assume the first thread is "main".  Other threads are given
     -- * non-zero thread-ids to help distinguish messages from concurrent
     -- * threads.
     -- *
     -- * Truncate the thread name if necessary to help with column alignment
     -- * in printf-style messages.
     -+ * The first thread in the process will have:
     -+ *     { .thread_id=0, .thread_name="main" }
     -+ * Subsequent threads are given a non-zero thread_id and a thread_name
     -+ * constructed from the id and a thread base name (which is usually just
     -+ * the name of the thread-proc function).  For example:
     -+ *     { .thread_id=10, .thread_name="th10fsm-listen" }
     -+ * This helps to identify and distinguish messages from concurrent threads.
     -+ * The ctx.thread_name field is truncated if necessary to help with column
     -+ * alignment in printf-style messages.
     -  *
        * In this and all following functions the term "self" refers to the
        * current thread.
        */
 -:  ----------- > 5:  8cb206b7632 trace2: improve thread-name documentation in the thread-context
 5:  4bf78e356e2 = 6:  8a89e1aa238 trace2: convert ctx.thread_name from strbuf to pointer
 6:  dd6d8e2841b ! 7:  8e701109976 trace2: add stopwatch timers
     @@ trace2/tr2_tgt.h
       struct json_writer;
      +struct tr2_timer_metadata;
      +struct tr2_timer;
     ++
     ++#define NS_PER_SEC_D ((double)1000*1000*1000)
       
       /*
        * Function prototypes for a TRACE2 "target" vtable.
     @@ trace2/tr2_tgt_event.c: static void fn_data_json_fl(const char *file, int line,
      +{
      +	const char *event_name = is_final_data ? "timer" : "th_timer";
      +	struct json_writer jw = JSON_WRITER_INIT;
     -+	double t_total = ((double)timer->total_ns) / 1000000000.0;
     -+	double t_min = ((double)timer->min_ns) / 1000000000.0;
     -+	double t_max = ((double)timer->max_ns) / 1000000000.0;
     ++	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
     ++	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
     ++	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
      +
      +	jw_object_begin(&jw, 0);
      +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
     @@ trace2/tr2_tgt_normal.c: static void fn_printf_va_fl(const char *file, int line,
      +{
      +	const char *event_name = is_final_data ? "timer" : "th_timer";
      +	struct strbuf buf_payload = STRBUF_INIT;
     -+	double t_total = ((double)timer->total_ns) / 1000000000.0;
     -+	double t_min = ((double)timer->min_ns) / 1000000000.0;
     -+	double t_max = ((double)timer->max_ns) / 1000000000.0;
     ++	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
     ++	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
     ++	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
      +
      +	strbuf_addf(&buf_payload, ("%s %s/%s"
      +				   " intervals:%"PRIu64
     @@ trace2/tr2_tgt_perf.c: static void fn_printf_va_fl(const char *file, int line,
      +{
      +	const char *event_name = is_final_data ? "timer" : "th_timer";
      +	struct strbuf buf_payload = STRBUF_INIT;
     -+	double t_total = ((double)timer->total_ns) / 1000000000.0;
     -+	double t_min = ((double)timer->min_ns) / 1000000000.0;
     -+	double t_max = ((double)timer->max_ns) / 1000000000.0;
     ++	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
     ++	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
     ++	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
      +
      +	strbuf_addf(&buf_payload, ("name:%s"
      +				   " intervals:%"PRIu64
 7:  cf012fcde37 ! 8:  5cd8bdde884 trace2: add global counter mechanism
     @@ trace2/tr2_tgt.h: struct repository;
      +struct tr2_counter_metadata;
      +struct tr2_counter;
       
     - /*
     -  * Function prototypes for a TRACE2 "target" vtable.
     + #define NS_PER_SEC_D ((double)1000*1000*1000)
     + 
      @@ trace2/tr2_tgt.h: typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
       				  const struct tr2_timer *timer,
       				  int is_final_data);

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 2/8] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..a90bd639d48 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -11,8 +11,8 @@
 struct tr2tls_thread_ctx {
 	struct strbuf thread_name;
 	uint64_t *array_us_start;
-	int alloc;
-	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
+	size_t alloc;
+	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 2/8] tr2tls: clarify TLS terminology
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 3/8] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Reduce or eliminate use of the term "TLS" in the Trace2 code.

The term "TLS" has two popular meanings: "thread-local storage" and
"transport layer security".  In the Trace2 source, the term is associated
with the former.  There was concern on the mailing list about it refering
to the latter.

Update the source and documentation to eliminate the use of the "TLS" term
or replace it with the phrase "thread-local storage" to reduce ambiguity.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  8 ++++----
 trace2.c                               |  2 +-
 trace2.h                               | 10 +++++-----
 trace2/tr2_tls.c                       |  6 +++---
 trace2/tr2_tls.h                       | 18 +++++++++++-------
 5 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 2afa28bb5aa..431d424f9d5 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -685,8 +685,8 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_start"`::
 	This event is generated when a thread is started.  It is
-	generated from *within* the new thread's thread-proc (for TLS
-	reasons).
+	generated from *within* the new thread's thread-proc (because
+	it needs to access data in the thread's thread-local storage).
 +
 ------------
 {
@@ -698,7 +698,7 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_exit"`::
 	This event is generated when a thread exits.  It is generated
-	from *within* the thread's thread-proc (for TLS reasons).
+	from *within* the thread's thread-proc.
 +
 ------------
 {
@@ -1206,7 +1206,7 @@ worked on 508 items at offset 2032.  Thread "th04" worked on 508 items
 at offset 508.
 +
 This example also shows that thread names are assigned in a racy manner
-as each thread starts and allocates TLS storage.
+as each thread starts.
 
 Config (def param) Events::
 
diff --git a/trace2.c b/trace2.c
index 0c0a11e07d5..c1244e45ace 100644
--- a/trace2.c
+++ b/trace2.c
@@ -52,7 +52,7 @@ static struct tr2_tgt *tr2_tgt_builtins[] =
  * Force (rather than lazily) initialize any of the requested
  * builtin TRACE2 targets at startup (and before we've seen an
  * actual TRACE2 event call) so we can see if we need to setup
- * the TR2 and TLS machinery.
+ * private data structures and thread-local storage.
  *
  * Return the number of builtin targets enabled.
  */
diff --git a/trace2.h b/trace2.h
index 88d906ea830..af3c11694cc 100644
--- a/trace2.h
+++ b/trace2.h
@@ -73,8 +73,7 @@ void trace2_initialize_clock(void);
 /*
  * Initialize TRACE2 tracing facility if any of the builtin TRACE2
  * targets are enabled in the system config or the environment.
- * This includes setting up the Trace2 thread local storage (TLS).
- * Emits a 'version' message containing the version of git
+ * This emits a 'version' message containing the version of git
  * and the Trace2 protocol.
  *
  * This function should be called from `main()` as early as possible in
@@ -302,7 +301,8 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
 
 /*
  * Emit a 'thread_start' event.  This must be called from inside the
- * thread-proc to set up the trace2 TLS data for the thread.
+ * thread-proc to allow the thread to create its own thread-local
+ * storage.
  *
  * Thread names should be descriptive, like "preload_index".
  * Thread names will be decorated with an instance number automatically.
@@ -315,8 +315,8 @@ void trace2_thread_start_fl(const char *file, int line,
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
- * thread-proc to report thread-specific data and cleanup TLS data
- * for the thread.
+ * thread-proc so that the thread can access and clean up its
+ * thread-local storage.
  */
 void trace2_thread_exit_fl(const char *file, int line);
 
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..8d2182fbdbb 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -69,9 +69,9 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void)
 	ctx = pthread_getspecific(tr2tls_key);
 
 	/*
-	 * If the thread-proc did not call trace2_thread_start(), we won't
-	 * have any TLS data associated with the current thread.  Fix it
-	 * here and silently continue.
+	 * If the current thread's thread-proc did not call
+	 * trace2_thread_start(), then the thread will not have any
+	 * thread-local storage.  Create it now and silently continue.
 	 */
 	if (!ctx)
 		ctx = tr2tls_create_self("unknown", getnanotime() / 1000);
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index a90bd639d48..1297509fd23 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -3,6 +3,12 @@
 
 #include "strbuf.h"
 
+/*
+ * Notice: the term "TLS" refers to "thread-local storage" in the
+ * Trace2 source files.  This usage is borrowed from GCC and Windows.
+ * There is NO relation to "transport layer security".
+ */
+
 /*
  * Arbitry limit for thread names for column alignment.
  */
@@ -17,9 +23,7 @@ struct tr2tls_thread_ctx {
 };
 
 /*
- * Create TLS data for the current thread.  This gives us a place to
- * put per-thread data, such as thread start time, function nesting
- * and a per-thread label for our messages.
+ * Create thread-local storage for the current thread.
  *
  * We assume the first thread is "main".  Other threads are given
  * non-zero thread-ids to help distinguish messages from concurrent
@@ -35,7 +39,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start);
 
 /*
- * Get our TLS data.
+ * Get the thread-local storage pointer of the current thread.
  */
 struct tr2tls_thread_ctx *tr2tls_get_self(void);
 
@@ -45,7 +49,7 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void);
 int tr2tls_is_main_thread(void);
 
 /*
- * Free our TLS data.
+ * Free the current thread's thread-local storage.
  */
 void tr2tls_unset_self(void);
 
@@ -81,12 +85,12 @@ uint64_t tr2tls_region_elasped_self(uint64_t us);
 uint64_t tr2tls_absolute_elapsed(uint64_t us);
 
 /*
- * Initialize the tr2 TLS system.
+ * Initialize thread-local storage for Trace2.
  */
 void tr2tls_init(void);
 
 /*
- * Free all tr2 TLS resources.
+ * Free all Trace2 thread-local storage resources.
  */
 void tr2tls_release(void);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 3/8] api-trace2.txt: elminate section describing the public trace2 API
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 2/8] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 4/8] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Eliminate the mostly obsolete `Public API` sub-section from the
`Trace2 API` section in the documentation.  Strengthen the referral
to `trace2.h`.

Most of the technical information in this sub-section was moved to
`trace2.h` in 6c51cb525d (trace2: move doc to trace2.h, 2019-11-17) to
be adjacent to the function prototypes.  The remaining text wasn't
that useful by itself.

Furthermore, the text would need a bit of overhaul to add routines
that do not immediately generate a message, such as stopwatch timers.
So it seemed simpler to just get rid of it.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 61 +++-----------------------
 1 file changed, 7 insertions(+), 54 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 431d424f9d5..9d43909d068 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -148,20 +148,18 @@ filename collisions).
 
 == Trace2 API
 
-All public Trace2 functions and macros are defined in `trace2.h` and
-`trace2.c`.  All public symbols are prefixed with `trace2_`.
+The Trace2 public API is defined and documented in `trace2.h`; refer to it for
+more information.  All public functions and macros are prefixed
+with `trace2_` and are implemented in `trace2.c`.
 
 There are no public Trace2 data structures.
 
 The Trace2 code also defines a set of private functions and data types
 in the `trace2/` directory.  These symbols are prefixed with `tr2_`
-and should only be used by functions in `trace2.c`.
+and should only be used by functions in `trace2.c` (or other private
+source files in `trace2/`).
 
-== Conventions for Public Functions and Macros
-
-The functions defined by the Trace2 API are declared and documented
-in `trace2.h`.  It defines the API functions and wrapper macros for
-Trace2.
+=== Conventions for Public Functions and Macros
 
 Some functions have a `_fl()` suffix to indicate that they take `file`
 and `line-number` arguments.
@@ -172,52 +170,7 @@ take a `va_list` argument.
 Some functions have a `_printf_fl()` suffix to indicate that they also
 take a `printf()` style format with a variable number of arguments.
 
-There are CPP wrapper macros and `#ifdef`s to hide most of these details.
-See `trace2.h` for more details.  The following discussion will only
-describe the simplified forms.
-
-== Public API
-
-All Trace2 API functions send a message to all of the active
-Trace2 Targets.  This section describes the set of available
-messages.
-
-It helps to divide these functions into groups for discussion
-purposes.
-
-=== Basic Command Messages
-
-These are concerned with the lifetime of the overall git process.
-e.g: `void trace2_initialize_clock()`, `void trace2_initialize()`,
-`int trace2_is_enabled()`, `void trace2_cmd_start(int argc, const char **argv)`.
-
-=== Command Detail Messages
-
-These are concerned with describing the specific Git command
-after the command line, config, and environment are inspected.
-e.g: `void trace2_cmd_name(const char *name)`,
-`void trace2_cmd_mode(const char *mode)`.
-
-=== Child Process Messages
-
-These are concerned with the various spawned child processes,
-including shell scripts, git commands, editors, pagers, and hooks.
-
-e.g: `void trace2_child_start(struct child_process *cmd)`.
-
-=== Git Thread Messages
-
-These messages are concerned with Git thread usage.
-
-e.g: `void trace2_thread_start(const char *thread_name)`.
-
-=== Region and Data Messages
-
-These are concerned with recording performance data
-over regions or spans of code. e.g:
-`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`.
-
-Refer to trace2.h for details about all trace2 functions.
+CPP wrapper macros are defined to hide most of these details.
 
 == Trace2 Target Formats
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 4/8] trace2: rename the thread_name argument to trace2_thread_start
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-10-20 18:28     ` [PATCH v3 3/8] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 5/8] trace2: improve thread-name documentation in the thread-context Jeff Hostetler via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Rename the `thread_name` argument in `tr2tls_create_self()` and
`trace2_thread_start()` to be `thread_base_name` to make it clearer
that the passed argument is a component used in the construction of
the actual `struct tr2tls_thread_ctx.thread_name` variable.

The base name will be used along with the thread id to create a
unique thread name.

This commit does not change how the `thread_name` field is
allocated or stored within the `tr2tls_thread_ctx` structure.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2.c         |  6 +++---
 trace2.h         | 11 ++++++-----
 trace2/tr2_tls.c |  4 ++--
 trace2/tr2_tls.h |  2 +-
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/trace2.c b/trace2.c
index c1244e45ace..165264dc79a 100644
--- a/trace2.c
+++ b/trace2.c
@@ -466,7 +466,7 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code)
 				file, line, us_elapsed_absolute, exec_id, code);
 }
 
-void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
+void trace2_thread_start_fl(const char *file, int line, const char *thread_base_name)
 {
 	struct tr2_tgt *tgt_j;
 	int j;
@@ -488,14 +488,14 @@ void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
 		 */
 		trace2_region_enter_printf_fl(file, line, NULL, NULL, NULL,
 					      "thread-proc on main: %s",
-					      thread_name);
+					      thread_base_name);
 		return;
 	}
 
 	us_now = getnanotime() / 1000;
 	us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
 
-	tr2tls_create_self(thread_name, us_now);
+	tr2tls_create_self(thread_base_name, us_now);
 
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_start_fl)
diff --git a/trace2.h b/trace2.h
index af3c11694cc..74cdb1354f7 100644
--- a/trace2.h
+++ b/trace2.h
@@ -304,14 +304,15 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
  * thread-proc to allow the thread to create its own thread-local
  * storage.
  *
- * Thread names should be descriptive, like "preload_index".
- * Thread names will be decorated with an instance number automatically.
+ * The thread base name should be descriptive, like "preload_index" or
+ * taken from the thread-proc function.  A unique thread name will be
+ * created from the given base name and the thread id automatically.
  */
 void trace2_thread_start_fl(const char *file, int line,
-			    const char *thread_name);
+			    const char *thread_base_name);
 
-#define trace2_thread_start(thread_name) \
-	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name))
+#define trace2_thread_start(thread_base_name) \
+	trace2_thread_start_fl(__FILE__, __LINE__, (thread_base_name))
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 8d2182fbdbb..4f7c516ecb6 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -31,7 +31,7 @@ void tr2tls_start_process_clock(void)
 	tr2tls_us_start_process = getnanotime() / 1000;
 }
 
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
@@ -50,7 +50,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	strbuf_init(&ctx->thread_name, 0);
 	if (ctx->thread_id)
 		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
+	strbuf_addstr(&ctx->thread_name, thread_base_name);
 	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
 		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
 
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 1297509fd23..d4e725f430b 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -35,7 +35,7 @@ struct tr2tls_thread_ctx {
  * In this and all following functions the term "self" refers to the
  * current thread.
  */
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start);
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 5/8] trace2: improve thread-name documentation in the thread-context
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-10-20 18:28     ` [PATCH v3 4/8] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:57       ` Ævar Arnfjörð Bjarmason
  2022-10-20 18:28     ` [PATCH v3 6/8] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Improve the documentation of the tr2tls_thread_ctx.thread_name field
and its relation to the tr2tls_thread_ctx.thread_id field.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index d4e725f430b..7d1f03a2ea6 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -25,12 +25,15 @@ struct tr2tls_thread_ctx {
 /*
  * Create thread-local storage for the current thread.
  *
- * We assume the first thread is "main".  Other threads are given
- * non-zero thread-ids to help distinguish messages from concurrent
- * threads.
- *
- * Truncate the thread name if necessary to help with column alignment
- * in printf-style messages.
+ * The first thread in the process will have:
+ *     { .thread_id=0, .thread_name="main" }
+ * Subsequent threads are given a non-zero thread_id and a thread_name
+ * constructed from the id and a thread base name (which is usually just
+ * the name of the thread-proc function).  For example:
+ *     { .thread_id=10, .thread_name="th10fsm-listen" }
+ * This helps to identify and distinguish messages from concurrent threads.
+ * The ctx.thread_name field is truncated if necessary to help with column
+ * alignment in printf-style messages.
  *
  * In this and all following functions the term "self" refers to the
  * current thread.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 6/8] trace2: convert ctx.thread_name from strbuf to pointer
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-10-20 18:28     ` [PATCH v3 5/8] trace2: improve thread-name documentation in the thread-context Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-20 18:28     ` [PATCH v3 7/8] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
to a "const char*" pointer.

The `thread_name` field is a constant string that is constructed when
the context is created.  Using a (non-const) `strbuf` structure for it
caused some confusion in the past because it implied that someone
could rename a thread after it was created.  That usage was not
intended.  Change it to a const pointer to make the intent more clear.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  |  2 +-
 trace2/tr2_tls.c       | 16 +++++++++-------
 trace2/tr2_tls.h       |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 37a3163be12..52f9356c695 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -90,7 +90,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 8cb792488c8..59ca58f862d 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -108,7 +108,7 @@ static void perf_fmt_prepare(const char *event_name,
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
+		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
 		    event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 4f7c516ecb6..3a67532aae4 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
+	struct strbuf buf = STRBUF_INIT;
 
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
@@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 
 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
-	strbuf_init(&ctx->thread_name, 0);
+	strbuf_init(&buf, 0);
 	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_base_name);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
+		strbuf_addf(&buf, "th%02d:", ctx->thread_id);
+	strbuf_addstr(&buf, thread_base_name);
+	if (buf.len > TR2_MAX_THREAD_NAME)
+		strbuf_setlen(&buf, TR2_MAX_THREAD_NAME);
+	ctx->thread_name = strbuf_detach(&buf, NULL);
 
 	pthread_setspecific(tr2tls_key, ctx);
 
@@ -95,7 +97,7 @@ void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
+	free((char *)ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +115,7 @@ void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 7d1f03a2ea6..e17cc462f87 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -15,7 +15,7 @@
 #define TR2_MAX_THREAD_NAME (24)
 
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
+	const char *thread_name;
 	uint64_t *array_us_start;
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 7/8] trace2: add stopwatch timers
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-10-20 18:28     ` [PATCH v3 6/8] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-20 20:25       ` Junio C Hamano
  2022-10-20 18:28     ` [PATCH v3 8/8] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  8 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add stopwatch timer mechanism to Trace2.

Timers are an alternative to Trace2 Regions.  Regions are useful for
measuring the time spent in various computation phases, such as the
time to read the index, time to scan for unstaged files, time to scan
for untracked files, and etc.

However, regions are not appropriate in all places.  For example,
during a checkout, it would be very inefficient to use regions to
measure the total time spent inflating objects from the ODB from
across the entire lifetime of the process; a per-unzip() region would
flood the output and significantly slow the command; and some form of
post-processing would be requried to compute the time spent in unzip().

Timers can be used to measure a series of timer intervals and emit
a single summary event (at thread and/or process exit).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  90 ++++++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  98 +++++++++++++
 t/t0211-trace2-perf.sh                 |  49 +++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               |  75 ++++++++++
 trace2.h                               |  43 ++++++
 trace2/tr2_tgt.h                       |   9 ++
 trace2/tr2_tgt_event.c                 |  26 ++++
 trace2/tr2_tgt_normal.c                |  23 ++++
 trace2/tr2_tgt_perf.c                  |  24 ++++
 trace2/tr2_tls.c                       |  10 ++
 trace2/tr2_tls.h                       |  10 ++
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 +++++++++++++++++++
 15 files changed, 786 insertions(+)
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 9d43909d068..75ce6f45603 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -769,6 +769,42 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running in the thread.  This event is generated when a thread
+	exits for timers that requested per-thread events.
++
+------------
+{
+	"event":"th_timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
+`"timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running aggregated across all threads.  This event is generated
+	when the process exits.
++
+------------
+{
+	"event":"timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
@@ -1200,6 +1236,60 @@ d0 | main                     | data         | r0  |  0.002126 |  0.002126 | fsy
 d0 | main                     | exit         |     |  0.000470 |           |              | code:0
 d0 | main                     | atexit       |     |  0.000477 |           |              | code:0
 ----------------
+
+Stopwatch Timer Events::
+
+	Measure the time spent in a function call or span of code
+	that might be called from many places within the code
+	throughout the life of the process.
++
+----------------
+static void expensive_function(void)
+{
+	trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+	...
+	sleep_millisec(1000); // Do something expensive
+	...
+	trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+}
+
+static int ut_100timer(int argc, const char **argv)
+{
+	...
+
+	expensive_function();
+
+	// Do something else 1...
+
+	expensive_function();
+
+	// Do something else 2...
+
+	expensive_function();
+
+	return 0;
+}
+----------------
++
+In this example, we measure the total time spent in
+`expensive_function()` regardless of when it is called
+in the overall flow of the program.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ t/helper/test-tool trace2 100timer 3 1000
+...
+$ cat ~/log.perf
+d0 | main                     | version      |     |           |           |              | ...
+d0 | main                     | start        |     |  0.001453 |           |              | t/helper/test-tool trace2 100timer 3 1000
+d0 | main                     | cmd_name     |     |           |           |              | trace2 (trace2)
+d0 | main                     | exit         |     |  3.003667 |           |              | code:0
+d0 | main                     | timer        |     |           |           | test         | name:test1 intervals:3 total:3.001686 min:1.000254 max:1.000929
+d0 | main                     | atexit       |     |  3.003796 |           |              | code:0
+----------------
+
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index cac3452edb9..820649bf62a 100644
--- a/Makefile
+++ b/Makefile
@@ -1102,6 +1102,7 @@ LIB_OBJS += trace2/tr2_tgt_event.o
 LIB_OBJS += trace2/tr2_tgt_normal.o
 LIB_OBJS += trace2/tr2_tgt_perf.o
 LIB_OBJS += trace2/tr2_tls.o
+LIB_OBJS += trace2/tr2_tmr.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport-helper.o
 LIB_OBJS += transport.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index a714130ece7..f951b9e97d7 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -228,6 +228,101 @@ static int ut_010bug_BUG(int argc, const char **argv)
 	BUG("a %s message", "BUG");
 }
 
+/*
+ * Single-threaded timer test.  Create several intervals using the
+ * TEST1 timer.  The test script can verify that an aggregate Trace2
+ * "timer" event is emitted indicating that we started+stopped the
+ * timer the requested number of times.
+ */
+static int ut_100timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay>";
+
+	int count = 0;
+	int delay = 0;
+	int k;
+
+	if (argc != 2)
+		die("%s", usage_error);
+	if (get_i(&count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&delay, argv[1]))
+		die("%s", usage_error);
+
+	for (k = 0; k < count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+		sleep_millisec(delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+	}
+
+	return 0;
+}
+
+struct ut_101_data {
+	int count;
+	int delay;
+};
+
+static void *ut_101timer_thread_proc(void *_ut_101_data)
+{
+	struct ut_101_data *data = _ut_101_data;
+	int k;
+
+	trace2_thread_start("ut_101");
+
+	for (k = 0; k < data->count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
+		sleep_millisec(data->delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
+	}
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+/*
+ * Multi-threaded timer test.  Create several threads that each create
+ * several intervals using the TEST2 timer.  The test script can verify
+ * that an individual Trace2 "th_timer" events for each thread and an
+ * aggregate "timer" event are generated.
+ */
+static int ut_101timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay> <threads>";
+
+	struct ut_101_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.delay, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_101timer_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -248,6 +343,9 @@ static struct unit_test ut_table[] = {
 	{ ut_008bug,      "008bug",    "" },
 	{ ut_009bug_BUG,  "009bug_BUG","" },
 	{ ut_010bug_BUG,  "010bug_BUG","" },
+
+	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
+	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 22d0845544e..5c28424e657 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -173,4 +173,53 @@ test_expect_success 'using global config, perf stream, return code 0' '
 	test_cmp expect actual
 '
 
+# Exercise the stopwatch timers in a loop and confirm that we have
+# as many start/stop intervals as expected.  We cannot really test the
+# actual (total, min, max) timer values, so we have to assume that they
+# are good, but we can verify the interval count.
+#
+# The timer "test/test1" should only emit a global summary "timer" event.
+# The timer "test/test2" should emit per-thread "th_timer" events and a
+# global summary "timer" event.
+
+have_timer_event () {
+	thread=$1 event=$2 category=$3 name=$4 intervals=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} intervals:${intervals}" &&
+
+	grep "${pattern}" ${file}
+}
+
+test_expect_success 'stopwatch timer test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test1" 5 times from "main".
+	test-tool trace2 100timer 5 10 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_timer_event "main" "timer" "test" "test1" 5 actual
+'
+
+test_expect_success 'stopwatch timer test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test2" 5 times each in 3 threads.
+	test-tool trace2 101timer 5 10 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_timer_event "th01:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th02:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th03:ut_101" "th_timer" "test" "test2" 5 actual &&
+
+	# And we should have 15 total uses.
+	have_timer_event "main" "timer" "test" "test2" 15 actual
+'
+
 test_done
diff --git a/t/t0211/scrub_perf.perl b/t/t0211/scrub_perf.perl
index 299999f0f89..7a50bae6463 100644
--- a/t/t0211/scrub_perf.perl
+++ b/t/t0211/scrub_perf.perl
@@ -64,6 +64,12 @@ while (<>) {
 	    goto SKIP_LINE;
 	}
     }
+    elsif ($tokens[$col_event] =~ m/timer/) {
+	# This also captures "th_timer" events
+	$tokens[$col_rest] =~ s/ total:\d+\.\d*/ total:_T_TOTAL_/;
+	$tokens[$col_rest] =~ s/ min:\d+\.\d*/ min:_T_MIN_/;
+	$tokens[$col_rest] =~ s/ max:\d+\.\d*/ max:_T_MAX_/;
+    }
 
     # t_abs and t_rel are either blank or a float.  Replace the float
     # with a constant for matching the HEREDOC in the test script.
diff --git a/trace2.c b/trace2.c
index 165264dc79a..a93cab7c2b7 100644
--- a/trace2.c
+++ b/trace2.c
@@ -13,6 +13,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static int trace2_enabled;
 
@@ -83,6 +84,23 @@ static void tr2_tgt_disable_builtins(void)
 		tgt_j->pfn_term();
 }
 
+/*
+ * The signature of this function must match the pfn_timer
+ * method in the targets.  (Think of this is an apply operation
+ * across the set of active targets.)
+ */
+static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
+				 const struct tr2_timer *timer,
+				 int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tgt_j->pfn_timer(meta, timer, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -110,6 +128,26 @@ static void tr2main_atexit_handler(void)
 	 */
 	tr2tls_pop_unwind_self();
 
+	/*
+	 * Some timers want per-thread details.  If the main thread
+	 * used one of those timers, emit the details now (before
+	 * we emit the aggregate timer values).
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data for the main thread to the final
+	 * totals.  And then emit the final timer values.
+	 *
+	 * Technically, we shouldn't need to hold the lock to update
+	 * and output the final_timer_block (since all other threads
+	 * should be dead by now), but it doesn't hurt anything.
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -541,6 +579,21 @@ void trace2_thread_exit_fl(const char *file, int line)
 	tr2tls_pop_unwind_self();
 	us_elapsed_thread = tr2tls_region_elasped_self(us_now);
 
+	/*
+	 * Some timers want per-thread details.  If this thread used
+	 * one of those timers, emit the details now.
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data from the current (non-main) thread
+	 * to the final totals.  (We'll accumulate data for the main
+	 * thread later during "atexit".)
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_exit_fl)
 			tgt_j->pfn_thread_exit_fl(file, line,
@@ -795,6 +848,28 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...)
 	va_end(ap);
 }
 
+void trace2_timer_start(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_start: invalid timer id: %d", tid);
+
+	tr2_start_timer(tid);
+}
+
+void trace2_timer_stop(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_stop: invalid timer id: %d", tid);
+
+	tr2_stop_timer(tid);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index 74cdb1354f7..7a843ac0518 100644
--- a/trace2.h
+++ b/trace2.h
@@ -51,6 +51,7 @@ struct json_writer;
  * [] trace2_region*    -- emit region nesting messages.
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
+ * [] trace2_timer*     -- stopwatch timers (messages are deferred).
  */
 
 /*
@@ -485,6 +486,48 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...);
 
 #define trace2_printf(...) trace2_printf_fl(__FILE__, __LINE__, __VA_ARGS__)
 
+/*
+ * Define the set of stopwatch timers.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum must also be added to the
+ * `tr2_timer_metadata[]` in `trace2/tr2_tmr.c`.
+ */
+enum trace2_timer_id {
+	/*
+	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional timer definitions before here. */
+	TRACE2_NUMBER_OF_TIMERS
+};
+
+/*
+ * Start/Stop the indicated stopwatch timer in the current thread.
+ *
+ * The time spent by the current thread between the _start and _stop
+ * calls will be added to the thread's partial sum for this timer.
+ *
+ * Timer events are emitted at thread and program exit.
+ *
+ * Note: Since the stopwatch API routines do not generate individual
+ * events, they do not take (file, line) arguments.  Similarly, the
+ * category and timer name values are defined at compile-time in the
+ * timer definitions array, so they are not needed here in the API.
+ */
+void trace2_timer_start(enum trace2_timer_id tid);
+void trace2_timer_stop(enum trace2_timer_id tid);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 65f94e15748..094036964d8 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -4,6 +4,10 @@
 struct child_process;
 struct repository;
 struct json_writer;
+struct tr2_timer_metadata;
+struct tr2_timer;
+
+#define NS_PER_SEC_D ((double)1000*1000*1000)
 
 /*
  * Function prototypes for a TRACE2 "target" vtable.
@@ -96,6 +100,10 @@ typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int line,
 					 uint64_t us_elapsed_absolute,
 					 const char *fmt, va_list ap);
 
+typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
+				  const struct tr2_timer *timer,
+				  int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -132,6 +140,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_fl_t                   *pfn_data_fl;
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
+	tr2_tgt_evt_timer_t                     *pfn_timer;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 52f9356c695..dbf6625e1b1 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -9,6 +9,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_event = {
 	.sysenv_var = TR2_SYSENV_EVENT,
@@ -617,6 +618,30 @@ static void fn_data_json_fl(const char *file, int line,
 	}
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
+	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
+	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "intervals", timer->interval_count);
+	jw_object_double(&jw, "t_total", 6, t_total);
+	jw_object_double(&jw, "t_min", 6, t_min);
+	jw_object_double(&jw, "t_max", 6, t_max);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -648,4 +673,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 69f80330778..f0582a4bf8a 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -8,6 +8,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_normal = {
 	.sysenv_var = TR2_SYSENV_NORMAL,
@@ -329,6 +330,27 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
+	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
+	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
+
+	strbuf_addf(&buf_payload, ("%s %s/%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    event_name, meta->category, meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -360,4 +382,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_fl = NULL,
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 59ca58f862d..399d1fa78e7 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -10,6 +10,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_perf = {
 	.sysenv_var = TR2_SYSENV_PERF,
@@ -555,6 +556,28 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
+	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
+	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
+
+	strbuf_addf(&buf_payload, ("name:%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -586,4 +609,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 3a67532aae4..04900bb4c3a 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -181,3 +181,13 @@ int tr2tls_locked_increment(int *p)
 
 	return current_value;
 }
+
+void tr2tls_lock(void)
+{
+	pthread_mutex_lock(&tr2tls_mutex);
+}
+
+void tr2tls_unlock(void)
+{
+	pthread_mutex_unlock(&tr2tls_mutex);
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index e17cc462f87..2322b0d0ef0 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_tmr.h"
 
 /*
  * Notice: the term "TLS" refers to "thread-local storage" in the
@@ -20,6 +21,9 @@ struct tr2tls_thread_ctx {
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+	struct tr2_timer_block timer_block;
+	unsigned int used_any_timer:1;
+	unsigned int used_any_per_thread_timer:1;
 };
 
 /*
@@ -107,4 +111,10 @@ int tr2tls_locked_increment(int *p);
  */
 void tr2tls_start_process_clock(void);
 
+/*
+ * Explicitly lock/unlock our mutex.
+ */
+void tr2tls_lock(void);
+void tr2tls_unlock(void);
+
 #endif /* TR2_TLS_H */
diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c
new file mode 100644
index 00000000000..786762dfd26
--- /dev/null
+++ b/trace2/tr2_tmr.c
@@ -0,0 +1,182 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
+
+#define MY_MAX(a, b) ((a) > (b) ? (a) : (b))
+#define MY_MIN(a, b) ((a) < (b) ? (a) : (b))
+
+/*
+ * A global timer block to aggregate values from the partial sums from
+ * each thread.
+ */
+static struct tr2_timer_block final_timer_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each stopwatch timer.
+ *
+ * This array must match "enum trace2_timer_id" and the values
+ * in "struct tr2_timer_block.timer[*]".
+ */
+static struct tr2_timer_metadata tr2_timer_metadata[TRACE2_NUMBER_OF_TIMERS] = {
+	[TRACE2_TIMER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_TIMER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_start_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+	t->recursion_count++;
+	if (t->recursion_count > 1)
+		return; /* ignore recursive starts */
+
+	t->start_ns = getnanotime();
+}
+
+void tr2_stop_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+	uint64_t ns_now;
+	uint64_t ns_interval;
+
+	assert(t->recursion_count > 0);
+
+	t->recursion_count--;
+	if (t->recursion_count)
+		return; /* still in recursive call(s) */
+
+	ns_now = getnanotime();
+	ns_interval = ns_now - t->start_ns;
+
+	t->total_ns += ns_interval;
+
+	/*
+	 * min_ns was initialized to zero (in the xcalloc()) rather
+	 * than UINT_MAX when the block of timers was allocated,
+	 * so we should always set both the min_ns and max_ns values
+	 * the first time that the timer is used.
+	 */
+	if (!t->interval_count) {
+		t->min_ns = ns_interval;
+		t->max_ns = ns_interval;
+	} else {
+		t->min_ns = MY_MIN(ns_interval, t->min_ns);
+		t->max_ns = MY_MAX(ns_interval, t->max_ns);
+	}
+
+	t->interval_count++;
+
+	ctx->used_any_timer = 1;
+	if (tr2_timer_metadata[tid].want_per_thread_events)
+		ctx->used_any_per_thread_timer = 1;
+}
+
+void tr2_update_final_timers(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_timer)
+		return;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		struct tr2_timer *t_final = &final_timer_block.timer[tid];
+		struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+		if (t->recursion_count) {
+			/*
+			 * The current thread is exiting with
+			 * timer[tid] still running.
+			 *
+			 * Technically, this is a bug, but I'm going
+			 * to ignore it.
+			 *
+			 * I don't think it is worth calling die()
+			 * for.  I don't think it is worth killing the
+			 * process for this bookkeeping error.  We
+			 * might want to call warning(), but I'm going
+			 * to wait on that.
+			 *
+			 * The downside here is that total_ns won't
+			 * include the current open interval (now -
+			 * start_ns).  I can live with that.
+			 */
+		}
+
+		if (!t->interval_count)
+			continue; /* this timer was not used by this thread */
+
+		t_final->total_ns += t->total_ns;
+
+		/*
+		 * final_timer_block.timer[tid].min_ns was initialized to
+		 * was initialized to zero rather than UINT_MAX, so we should
+		 * always set both the min_ns and max_ns values the first time
+		 * that we add a partial sum into it.
+		 */
+		if (!t_final->interval_count) {
+			t_final->min_ns = t->min_ns;
+			t_final->max_ns = t->max_ns;
+		} else {
+			t_final->min_ns = MY_MIN(t_final->min_ns, t->min_ns);
+			t_final->max_ns = MY_MAX(t_final->max_ns, t->max_ns);
+		}
+
+		t_final->interval_count += t->interval_count;
+	}
+}
+
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_per_thread_timer)
+		return;
+
+	/*
+	 * For each timer, if the timer wants per-thread events and
+	 * this thread used it, emit it.
+	 */
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (tr2_timer_metadata[tid].want_per_thread_events &&
+		    ctx->timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &ctx->timer_block.timer[tid],
+				 0);
+}
+
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	enum trace2_timer_id tid;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (final_timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &final_timer_block.timer[tid],
+				 1);
+}
diff --git a/trace2/tr2_tmr.h b/trace2/tr2_tmr.h
new file mode 100644
index 00000000000..d5753576134
--- /dev/null
+++ b/trace2/tr2_tmr.h
@@ -0,0 +1,140 @@
+#ifndef TR2_TMR_H
+#define TR2_TMR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow "stopwatch" timers.
+ *
+ * Timers can be used to measure "interesting" activity that does not
+ * fit the "region" model, such as code called from many different
+ * regions (like zlib) and/or where data for individual calls are not
+ * interesting or are too numerous to be efficiently logged.
+ *
+ * Timer values are accumulated during program execution and emitted
+ * to the Trace2 logs at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * timers and timer ids using a "timer block" array in thread-local
+ * storage.  This gives us constant time access to each timer within
+ * each thread, since we want start/stop operations to be as fast as
+ * possible.  This lets us avoid the complexities of dynamically
+ * allocating a timer on the first use by a thread and/or possibly
+ * sharing that timer definition with other concurrent threads.
+ * However, this does require that we define time the set of timers at
+ * compile time.
+ *
+ * Each thread uses the timer block in its thread-local storage to
+ * compute partial sums for each timer (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Using this "timer block" model costs ~48 bytes per timer per thread
+ * (we have about six uint64 fields per timer).  This does increase
+ * the size of the thread-local storage block, but it is allocated (at
+ * thread create time) and not on the thread stack, so I'm not worried
+ * about the size.
+ *
+ * Partial sums for each timer are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each timer are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "timer metadata" table contains the "category" and "name"
+ * fields for each timer.  This eliminates the need to include those
+ * args in the various timer APIs.
+ */
+
+/*
+ * The definition of an individual timer and used by an individual
+ * thread.
+ */
+struct tr2_timer {
+	/*
+	 * Total elapsed time for this timer in this thread in nanoseconds.
+	 */
+	uint64_t total_ns;
+
+	/*
+	 * The maximum and minimum interval values observed for this
+	 * timer in this thread.
+	 */
+	uint64_t min_ns;
+	uint64_t max_ns;
+
+	/*
+	 * The value of the clock when this timer was started in this
+	 * thread.  (Undefined when the timer is not active in this
+	 * thread.)
+	 */
+	uint64_t start_ns;
+
+	/*
+	 * Number of times that this timer has been started and stopped
+	 * in this thread.  (Recursive starts are ignored.)
+	 */
+	uint64_t interval_count;
+
+	/*
+	 * Number of nested starts on the stack in this thread.  (We
+	 * ignore recursive starts and use this to track the recursive
+	 * calls.)
+	 */
+	unsigned int recursion_count;
+};
+
+/*
+ * Metadata for a timer.
+ */
+struct tr2_timer_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this timer
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed-size block of timers to insert into
+ * thread-local storage.  This wrapper is used to avoid quirks
+ * of C and the usual need to pass an array size argument.
+ */
+struct tr2_timer_block {
+	struct tr2_timer timer[TRACE2_NUMBER_OF_TIMERS];
+};
+
+/*
+ * Private routines used by trace2.c to actually start/stop an
+ * individual timer in the current thread.
+ */
+void tr2_start_timer(enum trace2_timer_id tid);
+void tr2_stop_timer(enum trace2_timer_id tid);
+
+/*
+ * Add the current thread's timer data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_timers(void);
+
+/*
+ * Emit per-thread timer data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+/*
+ * Emit global total timer values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+#endif /* TR2_TMR_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 8/8] trace2: add global counter mechanism
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-10-20 18:28     ` [PATCH v3 7/8] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:28     ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-20 18:28 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add global counters mechanism to Trace2.

The Trace2 counters mechanism adds the ability to create a set of
global counter variables and an API to increment them efficiently.
Counters can optionally report per-thread usage in addition to the sum
across all threads.

Counter events are emitted to the Trace2 logs when a thread exits and
at process exit.

Counters are an alternative to `data` and `data_json` events.

Counters are useful when you want to measure something across the life
of the process, when you don't want per-measurement events for
performance reasons, when the data does not fit conveniently within a
region, or when your control flow does not easily let you write the
final total.  For example, you might use this to report the number of
calls to unzip() or the number of de-delta steps during a checkout.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  31 ++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  89 +++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  46 +++++++++++
 trace2.c                               |  52 +++++++++++--
 trace2.h                               |  37 +++++++++
 trace2/tr2_ctr.c                       | 101 ++++++++++++++++++++++++
 trace2/tr2_ctr.h                       | 104 +++++++++++++++++++++++++
 trace2/tr2_tgt.h                       |   7 ++
 trace2/tr2_tgt_event.c                 |  19 +++++
 trace2/tr2_tgt_normal.c                |  16 ++++
 trace2/tr2_tgt_perf.c                  |  17 ++++
 trace2/tr2_tls.h                       |   4 +
 13 files changed, 517 insertions(+), 7 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 75ce6f45603..de5fc250595 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -805,6 +805,37 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_counter"`::
+	This event logs the value of a counter variable in a thread.
+	This event is generated when a thread exits for counters that
+	requested per-thread events.
++
+------------
+{
+	"event":"th_counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+`"counter"`::
+	This event logs the value of a counter variable across all threads.
+	This event is generated when the process exits.  The total value
+	reported here is the sum across all threads.
++
+------------
+{
+	"event":"counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/Makefile b/Makefile
index 820649bf62a..29ab417ca3a 100644
--- a/Makefile
+++ b/Makefile
@@ -1094,6 +1094,7 @@ LIB_OBJS += trace.o
 LIB_OBJS += trace2.o
 LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
+LIB_OBJS += trace2/tr2_ctr.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
 LIB_OBJS += trace2/tr2_sysenv.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f951b9e97d7..1b092c60714 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -323,6 +323,92 @@ static int ut_101timer(int argc, const char **argv)
 	return 0;
 }
 
+/*
+ * Single-threaded counter test.  Add several values to the TEST1 counter.
+ * The test script can verify that the final sum is reported in the "counter"
+ * event.
+ */
+static int ut_200counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> [<v2> [...]]";
+	int value;
+	int k;
+
+	if (argc < 1)
+		die("%s", usage_error);
+
+	for (k = 0; k < argc; k++) {
+		if (get_i(&value, argv[k]))
+			die("invalid value[%s] -- %s",
+			    argv[k], usage_error);
+		trace2_counter_add(TRACE2_COUNTER_ID_TEST1, value);
+	}
+
+	return 0;
+}
+
+/*
+ * Multi-threaded counter test.  Create seveal threads that each increment
+ * the TEST2 global counter.  The test script can verify that an individual
+ * "th_counter" event is generated with a partial sum for each thread and
+ * that a final aggregate "counter" event is generated.
+ */
+
+struct ut_201_data {
+	int v1;
+	int v2;
+};
+
+static void *ut_201counter_thread_proc(void *_ut_201_data)
+{
+	struct ut_201_data *data = _ut_201_data;
+
+	trace2_thread_start("ut_201");
+
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v1);
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v2);
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+static int ut_201counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> <v2> <threads>";
+
+	struct ut_201_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.v1, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.v2, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_201counter_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -346,6 +432,9 @@ static struct unit_test ut_table[] = {
 
 	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
 	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
+
+	{ ut_200counter,  "200counter", "<v1> [<v2> [<v3> [...]]]" },
+	{ ut_201counter,  "201counter", "<v1> <v2> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 5c28424e657..0b3436e8cac 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -222,4 +222,50 @@ test_expect_success 'stopwatch timer test/test2' '
 	have_timer_event "main" "timer" "test" "test2" 15 actual
 '
 
+# Exercise the global counters and confirm that we get the expected values.
+#
+# The counter "test/test1" should only emit a global summary "counter" event.
+# The counter "test/test2" could emit per-thread "th_counter" events and a
+# global summary "counter" event.
+
+have_counter_event () {
+	thread=$1 event=$2 category=$3 name=$4 value=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} value:${value}" &&
+
+	grep "${patern}" ${file}
+}
+
+test_expect_success 'global counter test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the counter "test1" and add n integers.
+	test-tool trace2 200counter 1 2 3 4 5 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_counter_event "main" "counter" "test" "test1" 15 actual
+'
+
+test_expect_success 'global counter test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Add 2 integers to the counter "test2" in each of 3 threads.
+	test-tool trace2 201counter 7 13 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_counter_event "th01:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th02:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th03:ut_201" "th_counter" "test" "test2" 20 actual &&
+
+	# And we should have a single event with the total across all threads.
+	have_counter_event "main" "counter" "test" "test2" 60 actual
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index a93cab7c2b7..279bddf53b4 100644
--- a/trace2.c
+++ b/trace2.c
@@ -8,6 +8,7 @@
 #include "version.h"
 #include "trace2/tr2_cfg.h"
 #include "trace2/tr2_cmd_name.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
 #include "trace2/tr2_sysenv.h"
@@ -101,6 +102,22 @@ static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
 			tgt_j->pfn_timer(meta, timer, is_final_data);
 }
 
+/*
+ * The signature of this function must match the pfn_counter
+ * method in the targets.
+ */
+static void tr2_tgt_emit_a_counter(const struct tr2_counter_metadata *meta,
+				   const struct tr2_counter *counter,
+				   int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tgt_j->pfn_counter(meta, counter, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -132,20 +149,26 @@ static void tr2main_atexit_handler(void)
 	 * Some timers want per-thread details.  If the main thread
 	 * used one of those timers, emit the details now (before
 	 * we emit the aggregate timer values).
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data for the main thread to the final
-	 * totals.  And then emit the final timer values.
+	 * Add stopwatch timer and counter data for the main thread to
+	 * the final totals.  And then emit the final values.
 	 *
 	 * Technically, we shouldn't need to hold the lock to update
-	 * and output the final_timer_block (since all other threads
-	 * should be dead by now), but it doesn't hurt anything.
+	 * and output the final_timer_block and final_counter_block
+	 * (since all other threads should be dead by now), but it
+	 * doesn't hurt anything.
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_final_counters(tr2_tgt_emit_a_counter);
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -582,16 +605,20 @@ void trace2_thread_exit_fl(const char *file, int line)
 	/*
 	 * Some timers want per-thread details.  If this thread used
 	 * one of those timers, emit the details now.
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data from the current (non-main) thread
-	 * to the final totals.  (We'll accumulate data for the main
-	 * thread later during "atexit".)
+	 * Add stopwatch timer and counter data from the current
+	 * (non-main) thread to the final totals.  (We'll accumulate
+	 * data for the main thread later during "atexit".)
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -870,6 +897,17 @@ void trace2_timer_stop(enum trace2_timer_id tid)
 	tr2_stop_timer(tid);
 }
 
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS)
+		BUG("trace2_counter_add: invalid counter id: %d", cid);
+
+	tr2_counter_increment(cid, value);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index 7a843ac0518..4ced30c0db3 100644
--- a/trace2.h
+++ b/trace2.h
@@ -52,6 +52,7 @@ struct json_writer;
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
  * [] trace2_timer*     -- stopwatch timers (messages are deferred).
+ * [] trace2_counter*   -- global counters (messages are deferred).
  */
 
 /*
@@ -528,6 +529,42 @@ enum trace2_timer_id {
 void trace2_timer_start(enum trace2_timer_id tid);
 void trace2_timer_stop(enum trace2_timer_id tid);
 
+/*
+ * Define the set of global counters.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum be also be added to the
+ * `tr2_counter_metadata[]` in `trace2/tr2_tr2_ctr.c`.
+ */
+enum trace2_counter_id {
+	/*
+	 * Define two counters for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional counter definitions before here. */
+	TRACE2_NUMBER_OF_COUNTERS
+};
+
+/*
+ * Increase the named global counter by value.
+ *
+ * Note that this adds `value` to the current thread's partial sum for
+ * this counter (without locking) and that the complete sum is not
+ * available until all threads have exited, so it does not return the
+ * new value of the counter.
+ */
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
new file mode 100644
index 00000000000..483ca7c308f
--- /dev/null
+++ b/trace2/tr2_ctr.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_ctr.h"
+
+/*
+ * A global counter block to aggregrate values from the partial sums
+ * from each thread.
+ */
+static struct tr2_counter_block final_counter_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each global counter.
+ *
+ * This array must match the "enum trace2_counter_id" and the values
+ * in "struct tr2_counter_block.counter[*]".
+ */
+static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTERS] = {
+	[TRACE2_COUNTER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_COUNTER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+	c->value += value;
+
+	ctx->used_any_counter = 1;
+	if (tr2_counter_metadata[cid].want_per_thread_events)
+		ctx->used_any_per_thread_counter = 1;
+}
+
+void tr2_update_final_counters(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_counter)
+		return;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		struct tr2_counter *c_final = &final_counter_block.counter[cid];
+		const struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+		c_final->value += c->value;
+	}
+}
+
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_per_thread_counter)
+		return;
+
+	/*
+	 * For each counter, if the counter wants per-thread events
+	 * and this thread used it (the value is non-zero), emit it.
+	 */
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (tr2_counter_metadata[cid].want_per_thread_events &&
+		    ctx->counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &ctx->counter_block.counter[cid],
+				 0);
+}
+
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	enum trace2_counter_id cid;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (final_counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &final_counter_block.counter[cid],
+				 1);
+}
diff --git a/trace2/tr2_ctr.h b/trace2/tr2_ctr.h
new file mode 100644
index 00000000000..a2267ee9901
--- /dev/null
+++ b/trace2/tr2_ctr.h
@@ -0,0 +1,104 @@
+#ifndef TR2_CTR_H
+#define TR2_CTR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow global "counters".
+ *
+ * Counters can be used count interesting activity that does not fit
+ * the "region and data" model, such as code called from many
+ * different regions and/or where you want to count a number of items,
+ * but don't have control of when the last item will be processed,
+ * such as counter the number of calls to `lstat()`.
+ *
+ * Counters differ from Trace2 "data" events.  Data events are emitted
+ * immediately and are appropriate for documenting loop counters at
+ * the end of a region, for example.  Counter values are accumulated
+ * during the program and final counter values are emitted at program
+ * exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * counters and counter ids using a fixed size "counter block" array
+ * in thread-local storage.  This gives us constant time, lock-free
+ * access to each counter within each thread.  This lets us avoid the
+ * complexities of dynamically allocating a counter and sharing that
+ * definition with other threads.
+ *
+ * Each thread uses the counter block in its thread-local storage to
+ * increment partial sums for each counter (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Partial sums for each counter are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each counter are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "counter metadata" table contains the "category" and
+ * "name" fields for each counter.  This eliminates the need to
+ * include those args in the various counter APIs.
+ */
+
+/*
+ * The definition of an individual counter as used by an individual
+ * thread (and later in aggregation).
+ */
+struct tr2_counter {
+	uint64_t value;
+};
+
+/*
+ * Metadata for a counter.
+ */
+struct tr2_counter_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this counter
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed block of counters to insert into thread-local
+ * storage.  This wrapper is used to avoid quirks of C and the usual
+ * need to pass an array size argument.
+ */
+struct tr2_counter_block {
+	struct tr2_counter counter[TRACE2_NUMBER_OF_COUNTERS];
+};
+
+/*
+ * Private routines used by trace2.c to increment a counter for the
+ * current thread.
+ */
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value);
+
+/*
+ * Add the current thread's counter data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_counters(void);
+
+/*
+ * Emit per-thread counter data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+/*
+ * Emit global counter values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+#endif /* TR2_CTR_H */
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 094036964d8..95f4c754726 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -6,6 +6,8 @@ struct repository;
 struct json_writer;
 struct tr2_timer_metadata;
 struct tr2_timer;
+struct tr2_counter_metadata;
+struct tr2_counter;
 
 #define NS_PER_SEC_D ((double)1000*1000*1000)
 
@@ -104,6 +106,10 @@ typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
 				  const struct tr2_timer *timer,
 				  int is_final_data);
 
+typedef void(tr2_tgt_evt_counter_t)(const struct tr2_counter_metadata *meta,
+				    const struct tr2_counter *counter,
+				    int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -141,6 +147,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
 	tr2_tgt_evt_timer_t                     *pfn_timer;
+	tr2_tgt_evt_counter_t                   *pfn_counter;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index dbf6625e1b1..981863a6602 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -642,6 +642,24 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	jw_release(&jw);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "count", counter->value);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -674,4 +692,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index f0582a4bf8a..def18674e88 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -351,6 +351,21 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "%s %s/%s value:%"PRIu64,
+		    event_name, meta->category, meta->name,
+		    counter->value);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -383,4 +398,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 399d1fa78e7..db94b2ef47e 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -578,6 +578,22 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "name:%s value:%"PRIu64,
+		    meta->name,
+		    counter->value);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -610,4 +626,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 2322b0d0ef0..289b62d0721 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tmr.h"
 
 /*
@@ -22,8 +23,11 @@ struct tr2tls_thread_ctx {
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 	struct tr2_timer_block timer_block;
+	struct tr2_counter_block counter_block;
 	unsigned int used_any_timer:1;
 	unsigned int used_any_per_thread_timer:1;
+	unsigned int used_any_counter:1;
+	unsigned int used_any_per_thread_counter:1;
 };
 
 /*
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 5/8] trace2: improve thread-name documentation in the thread-context
  2022-10-20 18:28     ` [PATCH v3 5/8] trace2: improve thread-name documentation in the thread-context Jeff Hostetler via GitGitGadget
@ 2022-10-20 18:57       ` Ævar Arnfjörð Bjarmason
  2022-10-20 20:15         ` Jeff Hostetler
  0 siblings, 1 reply; 73+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-20 18:57 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Derrick Stolee, Jeff Hostetler


On Thu, Oct 20 2022, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Improve the documentation of the tr2tls_thread_ctx.thread_name field
> and its relation to the tr2tls_thread_ctx.thread_id field.

Good to see this split off, thanks!

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tls.h | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index d4e725f430b..7d1f03a2ea6 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -25,12 +25,15 @@ struct tr2tls_thread_ctx {
>  /*
>   * Create thread-local storage for the current thread.
>   *
> - * We assume the first thread is "main".  Other threads are given
> - * non-zero thread-ids to help distinguish messages from concurrent
> - * threads.
> - *
> - * Truncate the thread name if necessary to help with column alignment
> - * in printf-style messages.
> + * The first thread in the process will have:
> + *     { .thread_id=0, .thread_name="main" }
> + * Subsequent threads are given a non-zero thread_id and a thread_name
> + * constructed from the id and a thread base name (which is usually just
> + * the name of the thread-proc function).  For example:
> + *     { .thread_id=10, .thread_name="th10fsm-listen" }

I think the example is missing a ":" after the "th10", i.e. it should be
"th10:fsm-listen" per the code in 6/8:

	strbuf_addf(&buf, "th%02d:", ctx->thread_id);
        [...]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 5/8] trace2: improve thread-name documentation in the thread-context
  2022-10-20 18:57       ` Ævar Arnfjörð Bjarmason
@ 2022-10-20 20:15         ` Jeff Hostetler
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-20 20:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Jeff Hostetler via GitGitGadget
  Cc: git, Derrick Stolee, Jeff Hostetler



On 10/20/22 2:57 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Oct 20 2022, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Improve the documentation of the tr2tls_thread_ctx.thread_name field
>> and its relation to the tr2tls_thread_ctx.thread_id field.
> 
> Good to see this split off, thanks!
> 
[...]
>> + * the name of the thread-proc function).  For example:
>> + *     { .thread_id=10, .thread_name="th10fsm-listen" }
> 
> I think the example is missing a ":" after the "th10", i.e. it should be
> "th10:fsm-listen" per the code in 6/8:
> 
> 	strbuf_addf(&buf, "th%02d:", ctx->thread_id);
>          [...]
> 

oops.  :-)
good catch.

i'll fix up and resend, but will wait a bit for any other comments.

Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 7/8] trace2: add stopwatch timers
  2022-10-20 18:28     ` [PATCH v3 7/8] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2022-10-20 20:25       ` Junio C Hamano
  2022-10-20 20:52         ` Jeff Hostetler
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2022-10-20 20:25 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +#define NS_PER_SEC_D ((double)1000*1000*1000)
> ...
> +	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
> +	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
> +	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;

Hmph, it certainly is an improvement compared to the previous round,
but was there a reason why we did not want a more concise

	#define NS_TO_SECONDS(ns) ((double)(ns) / (1000*1000*1000.))

	double t_total = NS_TO_SECONDS(timer->total_ns);
	double t_min = NS_TO_SECONDS(timer->min_ns);
	double t_max = NS_TO_SECONDS(timer->max_ns);

that does not need to repeat (double) all over?

Not worth a reroll by itself.  Just wanted to know the reasoning
behind it, as I suspect I am missing the reason why it is good to
explicitly casting with (double) in some places; the above does not
look like one, though.

Thanks.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 7/8] trace2: add stopwatch timers
  2022-10-20 20:25       ` Junio C Hamano
@ 2022-10-20 20:52         ` Jeff Hostetler
  2022-10-20 20:55           ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-20 20:52 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Jeff Hostetler



On 10/20/22 4:25 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +#define NS_PER_SEC_D ((double)1000*1000*1000)
>> ...
>> +	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
>> +	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
>> +	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
> 
> Hmph, it certainly is an improvement compared to the previous round,
> but was there a reason why we did not want a more concise
> 
> 	#define NS_TO_SECONDS(ns) ((double)(ns) / (1000*1000*1000.))
> 
> 	double t_total = NS_TO_SECONDS(timer->total_ns);
> 	double t_min = NS_TO_SECONDS(timer->min_ns);
> 	double t_max = NS_TO_SECONDS(timer->max_ns);
> 
> that does not need to repeat (double) all over?
> 
> Not worth a reroll by itself.  Just wanted to know the reasoning
> behind it, as I suspect I am missing the reason why it is good to
> explicitly casting with (double) in some places; the above does not
> look like one, though.
> 
> Thanks.
>

um, it never occurred to me to make it a macro with an arg.
i just did a search/replace on the inline constant.

you're right though. your version is much shorter.

i'll reroll tomorrow with the typo that AEvar found.

Thanks
Jeff



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 7/8] trace2: add stopwatch timers
  2022-10-20 20:52         ` Jeff Hostetler
@ 2022-10-20 20:55           ` Junio C Hamano
  2022-10-21 21:51             ` Jeff Hostetler
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2022-10-20 20:55 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Jeff Hostetler via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> um, it never occurred to me to make it a macro with an arg.

Heh, that was what you responded with "good point" 6 hours ago ;-)

https://lore.kernel.org/git/aeb07c4f-f3f2-4965-6b6b-3ba3b10b2103@jeffhostetler.com/

> i'll reroll tomorrow with the typo that AEvar found.

Thanks.  Looking forward to.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 7/8] trace2: add stopwatch timers
  2022-10-20 20:55           ` Junio C Hamano
@ 2022-10-21 21:51             ` Jeff Hostetler
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler @ 2022-10-21 21:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff Hostetler via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Jeff Hostetler



On 10/20/22 4:55 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> um, it never occurred to me to make it a macro with an arg.
> 
> Heh, that was what you responded with "good point" 6 hours ago ;-)

d'oh.  6 hours was way too many meetings ago.... :-)

Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 0/8] Trace2 timers and counters and some cleanup
  2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-10-20 18:28     ` [PATCH v3 8/8] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:40     ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
                         ` (8 more replies)
  8 siblings, 9 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:40 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

Here is version 4 of this series to add timers and counters to Trace2.

Changes since V3:

 * Fixed typo in the new thread-name documentation.
 * Use a simpler NS_TO_SEC() macro for reporting the timer values.

Jeff Hostetler (8):
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  tr2tls: clarify TLS terminology
  api-trace2.txt: elminate section describing the public trace2 API
  trace2: rename the thread_name argument to trace2_thread_start
  trace2: improve thread-name documentation in the thread-context
  trace2: convert ctx.thread_name from strbuf to pointer
  trace2: add stopwatch timers
  trace2: add global counter mechanism

 Documentation/technical/api-trace2.txt | 190 +++++++++++++++++--------
 Makefile                               |   2 +
 t/helper/test-trace2.c                 | 187 ++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  95 +++++++++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               | 121 +++++++++++++++-
 trace2.h                               | 101 +++++++++++--
 trace2/tr2_ctr.c                       | 101 +++++++++++++
 trace2/tr2_ctr.h                       | 104 ++++++++++++++
 trace2/tr2_tgt.h                       |  16 +++
 trace2/tr2_tgt_event.c                 |  47 +++++-
 trace2/tr2_tgt_normal.c                |  39 +++++
 trace2/tr2_tgt_perf.c                  |  43 +++++-
 trace2/tr2_tls.c                       |  34 +++--
 trace2/tr2_tls.h                       |  55 ++++---
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 ++++++++++++++++++
 17 files changed, 1361 insertions(+), 102 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h


base-commit: 3dcec76d9df911ed8321007b1d197c1a206dc164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1373%2Fjeffhostetler%2Ftrace2-stopwatch-v4-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1373/jeffhostetler/trace2-stopwatch-v4-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1373

Range-diff vs v3:

 1:  6e7e4f3187e = 1:  6e7e4f3187e trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
 2:  9dee7a75903 = 2:  9dee7a75903 tr2tls: clarify TLS terminology
 3:  804dab9e1a7 = 3:  804dab9e1a7 api-trace2.txt: elminate section describing the public trace2 API
 4:  9adf9cee1a9 = 4:  9adf9cee1a9 trace2: rename the thread_name argument to trace2_thread_start
 5:  8cb206b7632 ! 5:  acfae17548c trace2: improve thread-name documentation in the thread-context
     @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
      + * Subsequent threads are given a non-zero thread_id and a thread_name
      + * constructed from the id and a thread base name (which is usually just
      + * the name of the thread-proc function).  For example:
     -+ *     { .thread_id=10, .thread_name="th10fsm-listen" }
     ++ *     { .thread_id=10, .thread_name="th10:fsm-listen" }
      + * This helps to identify and distinguish messages from concurrent threads.
      + * The ctx.thread_name field is truncated if necessary to help with column
      + * alignment in printf-style messages.
 6:  8a89e1aa238 = 6:  79c6406d492 trace2: convert ctx.thread_name from strbuf to pointer
 7:  8e701109976 ! 7:  a10c1bd96bb trace2: add stopwatch timers
     @@ trace2/tr2_tgt.h
      +struct tr2_timer_metadata;
      +struct tr2_timer;
      +
     -+#define NS_PER_SEC_D ((double)1000*1000*1000)
     ++#define NS_TO_SEC(ns) ((double)(ns) / 1.0e9)
       
       /*
        * Function prototypes for a TRACE2 "target" vtable.
     @@ trace2/tr2_tgt_event.c: static void fn_data_json_fl(const char *file, int line,
      +{
      +	const char *event_name = is_final_data ? "timer" : "th_timer";
      +	struct json_writer jw = JSON_WRITER_INIT;
     -+	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
     -+	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
     -+	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
     ++	double t_total = NS_TO_SEC(timer->total_ns);
     ++	double t_min = NS_TO_SEC(timer->min_ns);
     ++	double t_max = NS_TO_SEC(timer->max_ns);
      +
      +	jw_object_begin(&jw, 0);
      +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
     @@ trace2/tr2_tgt_normal.c: static void fn_printf_va_fl(const char *file, int line,
      +{
      +	const char *event_name = is_final_data ? "timer" : "th_timer";
      +	struct strbuf buf_payload = STRBUF_INIT;
     -+	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
     -+	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
     -+	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
     ++	double t_total = NS_TO_SEC(timer->total_ns);
     ++	double t_min = NS_TO_SEC(timer->min_ns);
     ++	double t_max = NS_TO_SEC(timer->max_ns);
      +
      +	strbuf_addf(&buf_payload, ("%s %s/%s"
      +				   " intervals:%"PRIu64
     @@ trace2/tr2_tgt_perf.c: static void fn_printf_va_fl(const char *file, int line,
      +{
      +	const char *event_name = is_final_data ? "timer" : "th_timer";
      +	struct strbuf buf_payload = STRBUF_INIT;
     -+	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
     -+	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
     -+	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
     ++	double t_total = NS_TO_SEC(timer->total_ns);
     ++	double t_min = NS_TO_SEC(timer->min_ns);
     ++	double t_max = NS_TO_SEC(timer->max_ns);
      +
      +	strbuf_addf(&buf_payload, ("name:%s"
      +				   " intervals:%"PRIu64
 8:  5cd8bdde884 ! 8:  b359a49cec9 trace2: add global counter mechanism
     @@ trace2/tr2_tgt.h: struct repository;
      +struct tr2_counter_metadata;
      +struct tr2_counter;
       
     - #define NS_PER_SEC_D ((double)1000*1000*1000)
     + #define NS_TO_SEC(ns) ((double)(ns) / 1.0e9)
       
      @@ trace2/tr2_tgt.h: typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
       				  const struct tr2_timer *timer,

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-24 20:31         ` Junio C Hamano
  2022-10-24 13:41       ` [PATCH v4 2/8] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..a90bd639d48 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -11,8 +11,8 @@
 struct tr2tls_thread_ctx {
 	struct strbuf thread_name;
 	uint64_t *array_us_start;
-	int alloc;
-	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
+	size_t alloc;
+	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 2/8] tr2tls: clarify TLS terminology
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 3/8] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
                         ` (6 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Reduce or eliminate use of the term "TLS" in the Trace2 code.

The term "TLS" has two popular meanings: "thread-local storage" and
"transport layer security".  In the Trace2 source, the term is associated
with the former.  There was concern on the mailing list about it refering
to the latter.

Update the source and documentation to eliminate the use of the "TLS" term
or replace it with the phrase "thread-local storage" to reduce ambiguity.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  8 ++++----
 trace2.c                               |  2 +-
 trace2.h                               | 10 +++++-----
 trace2/tr2_tls.c                       |  6 +++---
 trace2/tr2_tls.h                       | 18 +++++++++++-------
 5 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 2afa28bb5aa..431d424f9d5 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -685,8 +685,8 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_start"`::
 	This event is generated when a thread is started.  It is
-	generated from *within* the new thread's thread-proc (for TLS
-	reasons).
+	generated from *within* the new thread's thread-proc (because
+	it needs to access data in the thread's thread-local storage).
 +
 ------------
 {
@@ -698,7 +698,7 @@ The "exec_id" field is a command-unique id and is only useful if the
 
 `"thread_exit"`::
 	This event is generated when a thread exits.  It is generated
-	from *within* the thread's thread-proc (for TLS reasons).
+	from *within* the thread's thread-proc.
 +
 ------------
 {
@@ -1206,7 +1206,7 @@ worked on 508 items at offset 2032.  Thread "th04" worked on 508 items
 at offset 508.
 +
 This example also shows that thread names are assigned in a racy manner
-as each thread starts and allocates TLS storage.
+as each thread starts.
 
 Config (def param) Events::
 
diff --git a/trace2.c b/trace2.c
index 0c0a11e07d5..c1244e45ace 100644
--- a/trace2.c
+++ b/trace2.c
@@ -52,7 +52,7 @@ static struct tr2_tgt *tr2_tgt_builtins[] =
  * Force (rather than lazily) initialize any of the requested
  * builtin TRACE2 targets at startup (and before we've seen an
  * actual TRACE2 event call) so we can see if we need to setup
- * the TR2 and TLS machinery.
+ * private data structures and thread-local storage.
  *
  * Return the number of builtin targets enabled.
  */
diff --git a/trace2.h b/trace2.h
index 88d906ea830..af3c11694cc 100644
--- a/trace2.h
+++ b/trace2.h
@@ -73,8 +73,7 @@ void trace2_initialize_clock(void);
 /*
  * Initialize TRACE2 tracing facility if any of the builtin TRACE2
  * targets are enabled in the system config or the environment.
- * This includes setting up the Trace2 thread local storage (TLS).
- * Emits a 'version' message containing the version of git
+ * This emits a 'version' message containing the version of git
  * and the Trace2 protocol.
  *
  * This function should be called from `main()` as early as possible in
@@ -302,7 +301,8 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
 
 /*
  * Emit a 'thread_start' event.  This must be called from inside the
- * thread-proc to set up the trace2 TLS data for the thread.
+ * thread-proc to allow the thread to create its own thread-local
+ * storage.
  *
  * Thread names should be descriptive, like "preload_index".
  * Thread names will be decorated with an instance number automatically.
@@ -315,8 +315,8 @@ void trace2_thread_start_fl(const char *file, int line,
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
- * thread-proc to report thread-specific data and cleanup TLS data
- * for the thread.
+ * thread-proc so that the thread can access and clean up its
+ * thread-local storage.
  */
 void trace2_thread_exit_fl(const char *file, int line);
 
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..8d2182fbdbb 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -69,9 +69,9 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void)
 	ctx = pthread_getspecific(tr2tls_key);
 
 	/*
-	 * If the thread-proc did not call trace2_thread_start(), we won't
-	 * have any TLS data associated with the current thread.  Fix it
-	 * here and silently continue.
+	 * If the current thread's thread-proc did not call
+	 * trace2_thread_start(), then the thread will not have any
+	 * thread-local storage.  Create it now and silently continue.
 	 */
 	if (!ctx)
 		ctx = tr2tls_create_self("unknown", getnanotime() / 1000);
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index a90bd639d48..1297509fd23 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -3,6 +3,12 @@
 
 #include "strbuf.h"
 
+/*
+ * Notice: the term "TLS" refers to "thread-local storage" in the
+ * Trace2 source files.  This usage is borrowed from GCC and Windows.
+ * There is NO relation to "transport layer security".
+ */
+
 /*
  * Arbitry limit for thread names for column alignment.
  */
@@ -17,9 +23,7 @@ struct tr2tls_thread_ctx {
 };
 
 /*
- * Create TLS data for the current thread.  This gives us a place to
- * put per-thread data, such as thread start time, function nesting
- * and a per-thread label for our messages.
+ * Create thread-local storage for the current thread.
  *
  * We assume the first thread is "main".  Other threads are given
  * non-zero thread-ids to help distinguish messages from concurrent
@@ -35,7 +39,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start);
 
 /*
- * Get our TLS data.
+ * Get the thread-local storage pointer of the current thread.
  */
 struct tr2tls_thread_ctx *tr2tls_get_self(void);
 
@@ -45,7 +49,7 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void);
 int tr2tls_is_main_thread(void);
 
 /*
- * Free our TLS data.
+ * Free the current thread's thread-local storage.
  */
 void tr2tls_unset_self(void);
 
@@ -81,12 +85,12 @@ uint64_t tr2tls_region_elasped_self(uint64_t us);
 uint64_t tr2tls_absolute_elapsed(uint64_t us);
 
 /*
- * Initialize the tr2 TLS system.
+ * Initialize thread-local storage for Trace2.
  */
 void tr2tls_init(void);
 
 /*
- * Free all tr2 TLS resources.
+ * Free all Trace2 thread-local storage resources.
  */
 void tr2tls_release(void);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 3/8] api-trace2.txt: elminate section describing the public trace2 API
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 2/8] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 4/8] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Eliminate the mostly obsolete `Public API` sub-section from the
`Trace2 API` section in the documentation.  Strengthen the referral
to `trace2.h`.

Most of the technical information in this sub-section was moved to
`trace2.h` in 6c51cb525d (trace2: move doc to trace2.h, 2019-11-17) to
be adjacent to the function prototypes.  The remaining text wasn't
that useful by itself.

Furthermore, the text would need a bit of overhaul to add routines
that do not immediately generate a message, such as stopwatch timers.
So it seemed simpler to just get rid of it.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 61 +++-----------------------
 1 file changed, 7 insertions(+), 54 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 431d424f9d5..9d43909d068 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -148,20 +148,18 @@ filename collisions).
 
 == Trace2 API
 
-All public Trace2 functions and macros are defined in `trace2.h` and
-`trace2.c`.  All public symbols are prefixed with `trace2_`.
+The Trace2 public API is defined and documented in `trace2.h`; refer to it for
+more information.  All public functions and macros are prefixed
+with `trace2_` and are implemented in `trace2.c`.
 
 There are no public Trace2 data structures.
 
 The Trace2 code also defines a set of private functions and data types
 in the `trace2/` directory.  These symbols are prefixed with `tr2_`
-and should only be used by functions in `trace2.c`.
+and should only be used by functions in `trace2.c` (or other private
+source files in `trace2/`).
 
-== Conventions for Public Functions and Macros
-
-The functions defined by the Trace2 API are declared and documented
-in `trace2.h`.  It defines the API functions and wrapper macros for
-Trace2.
+=== Conventions for Public Functions and Macros
 
 Some functions have a `_fl()` suffix to indicate that they take `file`
 and `line-number` arguments.
@@ -172,52 +170,7 @@ take a `va_list` argument.
 Some functions have a `_printf_fl()` suffix to indicate that they also
 take a `printf()` style format with a variable number of arguments.
 
-There are CPP wrapper macros and `#ifdef`s to hide most of these details.
-See `trace2.h` for more details.  The following discussion will only
-describe the simplified forms.
-
-== Public API
-
-All Trace2 API functions send a message to all of the active
-Trace2 Targets.  This section describes the set of available
-messages.
-
-It helps to divide these functions into groups for discussion
-purposes.
-
-=== Basic Command Messages
-
-These are concerned with the lifetime of the overall git process.
-e.g: `void trace2_initialize_clock()`, `void trace2_initialize()`,
-`int trace2_is_enabled()`, `void trace2_cmd_start(int argc, const char **argv)`.
-
-=== Command Detail Messages
-
-These are concerned with describing the specific Git command
-after the command line, config, and environment are inspected.
-e.g: `void trace2_cmd_name(const char *name)`,
-`void trace2_cmd_mode(const char *mode)`.
-
-=== Child Process Messages
-
-These are concerned with the various spawned child processes,
-including shell scripts, git commands, editors, pagers, and hooks.
-
-e.g: `void trace2_child_start(struct child_process *cmd)`.
-
-=== Git Thread Messages
-
-These messages are concerned with Git thread usage.
-
-e.g: `void trace2_thread_start(const char *thread_name)`.
-
-=== Region and Data Messages
-
-These are concerned with recording performance data
-over regions or spans of code. e.g:
-`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`.
-
-Refer to trace2.h for details about all trace2 functions.
+CPP wrapper macros are defined to hide most of these details.
 
 == Trace2 Target Formats
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 4/8] trace2: rename the thread_name argument to trace2_thread_start
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-10-24 13:41       ` [PATCH v4 3/8] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 5/8] trace2: improve thread-name documentation in the thread-context Jeff Hostetler via GitGitGadget
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Rename the `thread_name` argument in `tr2tls_create_self()` and
`trace2_thread_start()` to be `thread_base_name` to make it clearer
that the passed argument is a component used in the construction of
the actual `struct tr2tls_thread_ctx.thread_name` variable.

The base name will be used along with the thread id to create a
unique thread name.

This commit does not change how the `thread_name` field is
allocated or stored within the `tr2tls_thread_ctx` structure.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2.c         |  6 +++---
 trace2.h         | 11 ++++++-----
 trace2/tr2_tls.c |  4 ++--
 trace2/tr2_tls.h |  2 +-
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/trace2.c b/trace2.c
index c1244e45ace..165264dc79a 100644
--- a/trace2.c
+++ b/trace2.c
@@ -466,7 +466,7 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code)
 				file, line, us_elapsed_absolute, exec_id, code);
 }
 
-void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
+void trace2_thread_start_fl(const char *file, int line, const char *thread_base_name)
 {
 	struct tr2_tgt *tgt_j;
 	int j;
@@ -488,14 +488,14 @@ void trace2_thread_start_fl(const char *file, int line, const char *thread_name)
 		 */
 		trace2_region_enter_printf_fl(file, line, NULL, NULL, NULL,
 					      "thread-proc on main: %s",
-					      thread_name);
+					      thread_base_name);
 		return;
 	}
 
 	us_now = getnanotime() / 1000;
 	us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
 
-	tr2tls_create_self(thread_name, us_now);
+	tr2tls_create_self(thread_base_name, us_now);
 
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_start_fl)
diff --git a/trace2.h b/trace2.h
index af3c11694cc..74cdb1354f7 100644
--- a/trace2.h
+++ b/trace2.h
@@ -304,14 +304,15 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
  * thread-proc to allow the thread to create its own thread-local
  * storage.
  *
- * Thread names should be descriptive, like "preload_index".
- * Thread names will be decorated with an instance number automatically.
+ * The thread base name should be descriptive, like "preload_index" or
+ * taken from the thread-proc function.  A unique thread name will be
+ * created from the given base name and the thread id automatically.
  */
 void trace2_thread_start_fl(const char *file, int line,
-			    const char *thread_name);
+			    const char *thread_base_name);
 
-#define trace2_thread_start(thread_name) \
-	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name))
+#define trace2_thread_start(thread_base_name) \
+	trace2_thread_start_fl(__FILE__, __LINE__, (thread_base_name))
 
 /*
  * Emit a 'thread_exit' event.  This must be called from inside the
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 8d2182fbdbb..4f7c516ecb6 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -31,7 +31,7 @@ void tr2tls_start_process_clock(void)
 	tr2tls_us_start_process = getnanotime() / 1000;
 }
 
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
@@ -50,7 +50,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	strbuf_init(&ctx->thread_name, 0);
 	if (ctx->thread_id)
 		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
+	strbuf_addstr(&ctx->thread_name, thread_base_name);
 	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
 		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
 
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 1297509fd23..d4e725f430b 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -35,7 +35,7 @@ struct tr2tls_thread_ctx {
  * In this and all following functions the term "self" refers to the
  * current thread.
  */
-struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
+struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start);
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 5/8] trace2: improve thread-name documentation in the thread-context
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-10-24 13:41       ` [PATCH v4 4/8] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 6/8] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Improve the documentation of the tr2tls_thread_ctx.thread_name field
and its relation to the tr2tls_thread_ctx.thread_id field.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index d4e725f430b..3ac4380d829 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -25,12 +25,15 @@ struct tr2tls_thread_ctx {
 /*
  * Create thread-local storage for the current thread.
  *
- * We assume the first thread is "main".  Other threads are given
- * non-zero thread-ids to help distinguish messages from concurrent
- * threads.
- *
- * Truncate the thread name if necessary to help with column alignment
- * in printf-style messages.
+ * The first thread in the process will have:
+ *     { .thread_id=0, .thread_name="main" }
+ * Subsequent threads are given a non-zero thread_id and a thread_name
+ * constructed from the id and a thread base name (which is usually just
+ * the name of the thread-proc function).  For example:
+ *     { .thread_id=10, .thread_name="th10:fsm-listen" }
+ * This helps to identify and distinguish messages from concurrent threads.
+ * The ctx.thread_name field is truncated if necessary to help with column
+ * alignment in printf-style messages.
  *
  * In this and all following functions the term "self" refers to the
  * current thread.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 6/8] trace2: convert ctx.thread_name from strbuf to pointer
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-10-24 13:41       ` [PATCH v4 5/8] trace2: improve thread-name documentation in the thread-context Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 7/8] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf`
to a "const char*" pointer.

The `thread_name` field is a constant string that is constructed when
the context is created.  Using a (non-const) `strbuf` structure for it
caused some confusion in the past because it implied that someone
could rename a thread after it was created.  That usage was not
intended.  Change it to a const pointer to make the intent more clear.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  |  2 +-
 trace2/tr2_tls.c       | 16 +++++++++-------
 trace2/tr2_tls.h       |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 37a3163be12..52f9356c695 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -90,7 +90,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 8cb792488c8..59ca58f862d 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -108,7 +108,7 @@ static void perf_fmt_prepare(const char *event_name,
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
+		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
 		    event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 4f7c516ecb6..3a67532aae4 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
+	struct strbuf buf = STRBUF_INIT;
 
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
@@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name,
 
 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
-	strbuf_init(&ctx->thread_name, 0);
+	strbuf_init(&buf, 0);
 	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_base_name);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
+		strbuf_addf(&buf, "th%02d:", ctx->thread_id);
+	strbuf_addstr(&buf, thread_base_name);
+	if (buf.len > TR2_MAX_THREAD_NAME)
+		strbuf_setlen(&buf, TR2_MAX_THREAD_NAME);
+	ctx->thread_name = strbuf_detach(&buf, NULL);
 
 	pthread_setspecific(tr2tls_key, ctx);
 
@@ -95,7 +97,7 @@ void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
+	free((char *)ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +115,7 @@ void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 3ac4380d829..65836b1399c 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -15,7 +15,7 @@
 #define TR2_MAX_THREAD_NAME (24)
 
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
+	const char *thread_name;
 	uint64_t *array_us_start;
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 7/8] trace2: add stopwatch timers
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-10-24 13:41       ` [PATCH v4 6/8] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-24 13:41       ` [PATCH v4 8/8] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
  2022-10-25 12:27       ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Derrick Stolee
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add stopwatch timer mechanism to Trace2.

Timers are an alternative to Trace2 Regions.  Regions are useful for
measuring the time spent in various computation phases, such as the
time to read the index, time to scan for unstaged files, time to scan
for untracked files, and etc.

However, regions are not appropriate in all places.  For example,
during a checkout, it would be very inefficient to use regions to
measure the total time spent inflating objects from the ODB from
across the entire lifetime of the process; a per-unzip() region would
flood the output and significantly slow the command; and some form of
post-processing would be requried to compute the time spent in unzip().

Timers can be used to measure a series of timer intervals and emit
a single summary event (at thread and/or process exit).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  90 ++++++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  98 +++++++++++++
 t/t0211-trace2-perf.sh                 |  49 +++++++
 t/t0211/scrub_perf.perl                |   6 +
 trace2.c                               |  75 ++++++++++
 trace2.h                               |  43 ++++++
 trace2/tr2_tgt.h                       |   9 ++
 trace2/tr2_tgt_event.c                 |  26 ++++
 trace2/tr2_tgt_normal.c                |  23 ++++
 trace2/tr2_tgt_perf.c                  |  24 ++++
 trace2/tr2_tls.c                       |  10 ++
 trace2/tr2_tls.h                       |  10 ++
 trace2/tr2_tmr.c                       | 182 +++++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 140 +++++++++++++++++++
 15 files changed, 786 insertions(+)
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 9d43909d068..75ce6f45603 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -769,6 +769,42 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running in the thread.  This event is generated when a thread
+	exits for timers that requested per-thread events.
++
+------------
+{
+	"event":"th_timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
+`"timer"`::
+	This event logs the amount of time that a stopwatch timer was
+	running aggregated across all threads.  This event is generated
+	when the process exits.
++
+------------
+{
+	"event":"timer",
+	...
+	"category":"my_category",
+	"name":"my_timer",
+	"intervals":5,         # number of time it was started/stopped
+	"t_total":0.052741,    # total time in seconds it was running
+	"t_min":0.010061,      # shortest interval
+	"t_max":0.011648       # longest interval
+}
+------------
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
@@ -1200,6 +1236,60 @@ d0 | main                     | data         | r0  |  0.002126 |  0.002126 | fsy
 d0 | main                     | exit         |     |  0.000470 |           |              | code:0
 d0 | main                     | atexit       |     |  0.000477 |           |              | code:0
 ----------------
+
+Stopwatch Timer Events::
+
+	Measure the time spent in a function call or span of code
+	that might be called from many places within the code
+	throughout the life of the process.
++
+----------------
+static void expensive_function(void)
+{
+	trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+	...
+	sleep_millisec(1000); // Do something expensive
+	...
+	trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+}
+
+static int ut_100timer(int argc, const char **argv)
+{
+	...
+
+	expensive_function();
+
+	// Do something else 1...
+
+	expensive_function();
+
+	// Do something else 2...
+
+	expensive_function();
+
+	return 0;
+}
+----------------
++
+In this example, we measure the total time spent in
+`expensive_function()` regardless of when it is called
+in the overall flow of the program.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ t/helper/test-tool trace2 100timer 3 1000
+...
+$ cat ~/log.perf
+d0 | main                     | version      |     |           |           |              | ...
+d0 | main                     | start        |     |  0.001453 |           |              | t/helper/test-tool trace2 100timer 3 1000
+d0 | main                     | cmd_name     |     |           |           |              | trace2 (trace2)
+d0 | main                     | exit         |     |  3.003667 |           |              | code:0
+d0 | main                     | timer        |     |           |           | test         | name:test1 intervals:3 total:3.001686 min:1.000254 max:1.000929
+d0 | main                     | atexit       |     |  3.003796 |           |              | code:0
+----------------
+
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index cac3452edb9..820649bf62a 100644
--- a/Makefile
+++ b/Makefile
@@ -1102,6 +1102,7 @@ LIB_OBJS += trace2/tr2_tgt_event.o
 LIB_OBJS += trace2/tr2_tgt_normal.o
 LIB_OBJS += trace2/tr2_tgt_perf.o
 LIB_OBJS += trace2/tr2_tls.o
+LIB_OBJS += trace2/tr2_tmr.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport-helper.o
 LIB_OBJS += transport.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index a714130ece7..f951b9e97d7 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -228,6 +228,101 @@ static int ut_010bug_BUG(int argc, const char **argv)
 	BUG("a %s message", "BUG");
 }
 
+/*
+ * Single-threaded timer test.  Create several intervals using the
+ * TEST1 timer.  The test script can verify that an aggregate Trace2
+ * "timer" event is emitted indicating that we started+stopped the
+ * timer the requested number of times.
+ */
+static int ut_100timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay>";
+
+	int count = 0;
+	int delay = 0;
+	int k;
+
+	if (argc != 2)
+		die("%s", usage_error);
+	if (get_i(&count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&delay, argv[1]))
+		die("%s", usage_error);
+
+	for (k = 0; k < count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+		sleep_millisec(delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+	}
+
+	return 0;
+}
+
+struct ut_101_data {
+	int count;
+	int delay;
+};
+
+static void *ut_101timer_thread_proc(void *_ut_101_data)
+{
+	struct ut_101_data *data = _ut_101_data;
+	int k;
+
+	trace2_thread_start("ut_101");
+
+	for (k = 0; k < data->count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
+		sleep_millisec(data->delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
+	}
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+/*
+ * Multi-threaded timer test.  Create several threads that each create
+ * several intervals using the TEST2 timer.  The test script can verify
+ * that an individual Trace2 "th_timer" events for each thread and an
+ * aggregate "timer" event are generated.
+ */
+static int ut_101timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay> <threads>";
+
+	struct ut_101_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.delay, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_101timer_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -248,6 +343,9 @@ static struct unit_test ut_table[] = {
 	{ ut_008bug,      "008bug",    "" },
 	{ ut_009bug_BUG,  "009bug_BUG","" },
 	{ ut_010bug_BUG,  "010bug_BUG","" },
+
+	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
+	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 22d0845544e..5c28424e657 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -173,4 +173,53 @@ test_expect_success 'using global config, perf stream, return code 0' '
 	test_cmp expect actual
 '
 
+# Exercise the stopwatch timers in a loop and confirm that we have
+# as many start/stop intervals as expected.  We cannot really test the
+# actual (total, min, max) timer values, so we have to assume that they
+# are good, but we can verify the interval count.
+#
+# The timer "test/test1" should only emit a global summary "timer" event.
+# The timer "test/test2" should emit per-thread "th_timer" events and a
+# global summary "timer" event.
+
+have_timer_event () {
+	thread=$1 event=$2 category=$3 name=$4 intervals=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} intervals:${intervals}" &&
+
+	grep "${pattern}" ${file}
+}
+
+test_expect_success 'stopwatch timer test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test1" 5 times from "main".
+	test-tool trace2 100timer 5 10 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_timer_event "main" "timer" "test" "test1" 5 actual
+'
+
+test_expect_success 'stopwatch timer test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the timer "test2" 5 times each in 3 threads.
+	test-tool trace2 101timer 5 10 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_timer_event "th01:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th02:ut_101" "th_timer" "test" "test2" 5 actual &&
+	have_timer_event "th03:ut_101" "th_timer" "test" "test2" 5 actual &&
+
+	# And we should have 15 total uses.
+	have_timer_event "main" "timer" "test" "test2" 15 actual
+'
+
 test_done
diff --git a/t/t0211/scrub_perf.perl b/t/t0211/scrub_perf.perl
index 299999f0f89..7a50bae6463 100644
--- a/t/t0211/scrub_perf.perl
+++ b/t/t0211/scrub_perf.perl
@@ -64,6 +64,12 @@ while (<>) {
 	    goto SKIP_LINE;
 	}
     }
+    elsif ($tokens[$col_event] =~ m/timer/) {
+	# This also captures "th_timer" events
+	$tokens[$col_rest] =~ s/ total:\d+\.\d*/ total:_T_TOTAL_/;
+	$tokens[$col_rest] =~ s/ min:\d+\.\d*/ min:_T_MIN_/;
+	$tokens[$col_rest] =~ s/ max:\d+\.\d*/ max:_T_MAX_/;
+    }
 
     # t_abs and t_rel are either blank or a float.  Replace the float
     # with a constant for matching the HEREDOC in the test script.
diff --git a/trace2.c b/trace2.c
index 165264dc79a..a93cab7c2b7 100644
--- a/trace2.c
+++ b/trace2.c
@@ -13,6 +13,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static int trace2_enabled;
 
@@ -83,6 +84,23 @@ static void tr2_tgt_disable_builtins(void)
 		tgt_j->pfn_term();
 }
 
+/*
+ * The signature of this function must match the pfn_timer
+ * method in the targets.  (Think of this is an apply operation
+ * across the set of active targets.)
+ */
+static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
+				 const struct tr2_timer *timer,
+				 int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tgt_j->pfn_timer(meta, timer, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -110,6 +128,26 @@ static void tr2main_atexit_handler(void)
 	 */
 	tr2tls_pop_unwind_self();
 
+	/*
+	 * Some timers want per-thread details.  If the main thread
+	 * used one of those timers, emit the details now (before
+	 * we emit the aggregate timer values).
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data for the main thread to the final
+	 * totals.  And then emit the final timer values.
+	 *
+	 * Technically, we shouldn't need to hold the lock to update
+	 * and output the final_timer_block (since all other threads
+	 * should be dead by now), but it doesn't hurt anything.
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -541,6 +579,21 @@ void trace2_thread_exit_fl(const char *file, int line)
 	tr2tls_pop_unwind_self();
 	us_elapsed_thread = tr2tls_region_elasped_self(us_now);
 
+	/*
+	 * Some timers want per-thread details.  If this thread used
+	 * one of those timers, emit the details now.
+	 */
+	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+
+	/*
+	 * Add stopwatch timer data from the current (non-main) thread
+	 * to the final totals.  (We'll accumulate data for the main
+	 * thread later during "atexit".)
+	 */
+	tr2tls_lock();
+	tr2_update_final_timers();
+	tr2tls_unlock();
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_thread_exit_fl)
 			tgt_j->pfn_thread_exit_fl(file, line,
@@ -795,6 +848,28 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...)
 	va_end(ap);
 }
 
+void trace2_timer_start(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_start: invalid timer id: %d", tid);
+
+	tr2_start_timer(tid);
+}
+
+void trace2_timer_stop(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_stop: invalid timer id: %d", tid);
+
+	tr2_stop_timer(tid);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index 74cdb1354f7..7a843ac0518 100644
--- a/trace2.h
+++ b/trace2.h
@@ -51,6 +51,7 @@ struct json_writer;
  * [] trace2_region*    -- emit region nesting messages.
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
+ * [] trace2_timer*     -- stopwatch timers (messages are deferred).
  */
 
 /*
@@ -485,6 +486,48 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...);
 
 #define trace2_printf(...) trace2_printf_fl(__FILE__, __LINE__, __VA_ARGS__)
 
+/*
+ * Define the set of stopwatch timers.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum must also be added to the
+ * `tr2_timer_metadata[]` in `trace2/tr2_tmr.c`.
+ */
+enum trace2_timer_id {
+	/*
+	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional timer definitions before here. */
+	TRACE2_NUMBER_OF_TIMERS
+};
+
+/*
+ * Start/Stop the indicated stopwatch timer in the current thread.
+ *
+ * The time spent by the current thread between the _start and _stop
+ * calls will be added to the thread's partial sum for this timer.
+ *
+ * Timer events are emitted at thread and program exit.
+ *
+ * Note: Since the stopwatch API routines do not generate individual
+ * events, they do not take (file, line) arguments.  Similarly, the
+ * category and timer name values are defined at compile-time in the
+ * timer definitions array, so they are not needed here in the API.
+ */
+void trace2_timer_start(enum trace2_timer_id tid);
+void trace2_timer_stop(enum trace2_timer_id tid);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 65f94e15748..85c8d2d7f5a 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -4,6 +4,10 @@
 struct child_process;
 struct repository;
 struct json_writer;
+struct tr2_timer_metadata;
+struct tr2_timer;
+
+#define NS_TO_SEC(ns) ((double)(ns) / 1.0e9)
 
 /*
  * Function prototypes for a TRACE2 "target" vtable.
@@ -96,6 +100,10 @@ typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int line,
 					 uint64_t us_elapsed_absolute,
 					 const char *fmt, va_list ap);
 
+typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
+				  const struct tr2_timer *timer,
+				  int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -132,6 +140,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_fl_t                   *pfn_data_fl;
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
+	tr2_tgt_evt_timer_t                     *pfn_timer;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 52f9356c695..af5a8edb474 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -9,6 +9,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_event = {
 	.sysenv_var = TR2_SYSENV_EVENT,
@@ -617,6 +618,30 @@ static void fn_data_json_fl(const char *file, int line,
 	}
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_total = NS_TO_SEC(timer->total_ns);
+	double t_min = NS_TO_SEC(timer->min_ns);
+	double t_max = NS_TO_SEC(timer->max_ns);
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "intervals", timer->interval_count);
+	jw_object_double(&jw, "t_total", 6, t_total);
+	jw_object_double(&jw, "t_min", 6, t_min);
+	jw_object_double(&jw, "t_max", 6, t_max);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -648,4 +673,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 69f80330778..b079baf1002 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -8,6 +8,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_normal = {
 	.sysenv_var = TR2_SYSENV_NORMAL,
@@ -329,6 +330,27 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = NS_TO_SEC(timer->total_ns);
+	double t_min = NS_TO_SEC(timer->min_ns);
+	double t_max = NS_TO_SEC(timer->max_ns);
+
+	strbuf_addf(&buf_payload, ("%s %s/%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    event_name, meta->category, meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -360,4 +382,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_fl = NULL,
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 59ca58f862d..e69375e9799 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -10,6 +10,7 @@
 #include "trace2/tr2_tbuf.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static struct tr2_dst tr2dst_perf = {
 	.sysenv_var = TR2_SYSENV_PERF,
@@ -555,6 +556,28 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(const struct tr2_timer_metadata *meta,
+		     const struct tr2_timer *timer,
+		     int is_final_data)
+{
+	const char *event_name = is_final_data ? "timer" : "th_timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+	double t_total = NS_TO_SEC(timer->total_ns);
+	double t_min = NS_TO_SEC(timer->min_ns);
+	double t_max = NS_TO_SEC(timer->max_ns);
+
+	strbuf_addf(&buf_payload, ("name:%s"
+				   " intervals:%"PRIu64
+				   " total:%8.6f min:%8.6f max:%8.6f"),
+		    meta->name,
+		    timer->interval_count,
+		    t_total, t_min, t_max);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -586,4 +609,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_fl = fn_data_fl,
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
+	.pfn_timer = fn_timer,
 };
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 3a67532aae4..04900bb4c3a 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -181,3 +181,13 @@ int tr2tls_locked_increment(int *p)
 
 	return current_value;
 }
+
+void tr2tls_lock(void)
+{
+	pthread_mutex_lock(&tr2tls_mutex);
+}
+
+void tr2tls_unlock(void)
+{
+	pthread_mutex_unlock(&tr2tls_mutex);
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 65836b1399c..a064b66e4cc 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_tmr.h"
 
 /*
  * Notice: the term "TLS" refers to "thread-local storage" in the
@@ -20,6 +21,9 @@ struct tr2tls_thread_ctx {
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+	struct tr2_timer_block timer_block;
+	unsigned int used_any_timer:1;
+	unsigned int used_any_per_thread_timer:1;
 };
 
 /*
@@ -107,4 +111,10 @@ int tr2tls_locked_increment(int *p);
  */
 void tr2tls_start_process_clock(void);
 
+/*
+ * Explicitly lock/unlock our mutex.
+ */
+void tr2tls_lock(void);
+void tr2tls_unlock(void);
+
 #endif /* TR2_TLS_H */
diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c
new file mode 100644
index 00000000000..786762dfd26
--- /dev/null
+++ b/trace2/tr2_tmr.c
@@ -0,0 +1,182 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
+
+#define MY_MAX(a, b) ((a) > (b) ? (a) : (b))
+#define MY_MIN(a, b) ((a) < (b) ? (a) : (b))
+
+/*
+ * A global timer block to aggregate values from the partial sums from
+ * each thread.
+ */
+static struct tr2_timer_block final_timer_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each stopwatch timer.
+ *
+ * This array must match "enum trace2_timer_id" and the values
+ * in "struct tr2_timer_block.timer[*]".
+ */
+static struct tr2_timer_metadata tr2_timer_metadata[TRACE2_NUMBER_OF_TIMERS] = {
+	[TRACE2_TIMER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_TIMER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_start_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+	t->recursion_count++;
+	if (t->recursion_count > 1)
+		return; /* ignore recursive starts */
+
+	t->start_ns = getnanotime();
+}
+
+void tr2_stop_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timer_block.timer[tid];
+	uint64_t ns_now;
+	uint64_t ns_interval;
+
+	assert(t->recursion_count > 0);
+
+	t->recursion_count--;
+	if (t->recursion_count)
+		return; /* still in recursive call(s) */
+
+	ns_now = getnanotime();
+	ns_interval = ns_now - t->start_ns;
+
+	t->total_ns += ns_interval;
+
+	/*
+	 * min_ns was initialized to zero (in the xcalloc()) rather
+	 * than UINT_MAX when the block of timers was allocated,
+	 * so we should always set both the min_ns and max_ns values
+	 * the first time that the timer is used.
+	 */
+	if (!t->interval_count) {
+		t->min_ns = ns_interval;
+		t->max_ns = ns_interval;
+	} else {
+		t->min_ns = MY_MIN(ns_interval, t->min_ns);
+		t->max_ns = MY_MAX(ns_interval, t->max_ns);
+	}
+
+	t->interval_count++;
+
+	ctx->used_any_timer = 1;
+	if (tr2_timer_metadata[tid].want_per_thread_events)
+		ctx->used_any_per_thread_timer = 1;
+}
+
+void tr2_update_final_timers(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_timer)
+		return;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		struct tr2_timer *t_final = &final_timer_block.timer[tid];
+		struct tr2_timer *t = &ctx->timer_block.timer[tid];
+
+		if (t->recursion_count) {
+			/*
+			 * The current thread is exiting with
+			 * timer[tid] still running.
+			 *
+			 * Technically, this is a bug, but I'm going
+			 * to ignore it.
+			 *
+			 * I don't think it is worth calling die()
+			 * for.  I don't think it is worth killing the
+			 * process for this bookkeeping error.  We
+			 * might want to call warning(), but I'm going
+			 * to wait on that.
+			 *
+			 * The downside here is that total_ns won't
+			 * include the current open interval (now -
+			 * start_ns).  I can live with that.
+			 */
+		}
+
+		if (!t->interval_count)
+			continue; /* this timer was not used by this thread */
+
+		t_final->total_ns += t->total_ns;
+
+		/*
+		 * final_timer_block.timer[tid].min_ns was initialized to
+		 * was initialized to zero rather than UINT_MAX, so we should
+		 * always set both the min_ns and max_ns values the first time
+		 * that we add a partial sum into it.
+		 */
+		if (!t_final->interval_count) {
+			t_final->min_ns = t->min_ns;
+			t_final->max_ns = t->max_ns;
+		} else {
+			t_final->min_ns = MY_MIN(t_final->min_ns, t->min_ns);
+			t_final->max_ns = MY_MAX(t_final->max_ns, t->max_ns);
+		}
+
+		t_final->interval_count += t->interval_count;
+	}
+}
+
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_timer_id tid;
+
+	if (!ctx->used_any_per_thread_timer)
+		return;
+
+	/*
+	 * For each timer, if the timer wants per-thread events and
+	 * this thread used it, emit it.
+	 */
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (tr2_timer_metadata[tid].want_per_thread_events &&
+		    ctx->timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &ctx->timer_block.timer[tid],
+				 0);
+}
+
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply)
+{
+	enum trace2_timer_id tid;
+
+	/*
+	 * Accessing `final_timer_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++)
+		if (final_timer_block.timer[tid].interval_count)
+			fn_apply(&tr2_timer_metadata[tid],
+				 &final_timer_block.timer[tid],
+				 1);
+}
diff --git a/trace2/tr2_tmr.h b/trace2/tr2_tmr.h
new file mode 100644
index 00000000000..d5753576134
--- /dev/null
+++ b/trace2/tr2_tmr.h
@@ -0,0 +1,140 @@
+#ifndef TR2_TMR_H
+#define TR2_TMR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow "stopwatch" timers.
+ *
+ * Timers can be used to measure "interesting" activity that does not
+ * fit the "region" model, such as code called from many different
+ * regions (like zlib) and/or where data for individual calls are not
+ * interesting or are too numerous to be efficiently logged.
+ *
+ * Timer values are accumulated during program execution and emitted
+ * to the Trace2 logs at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * timers and timer ids using a "timer block" array in thread-local
+ * storage.  This gives us constant time access to each timer within
+ * each thread, since we want start/stop operations to be as fast as
+ * possible.  This lets us avoid the complexities of dynamically
+ * allocating a timer on the first use by a thread and/or possibly
+ * sharing that timer definition with other concurrent threads.
+ * However, this does require that we define time the set of timers at
+ * compile time.
+ *
+ * Each thread uses the timer block in its thread-local storage to
+ * compute partial sums for each timer (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Using this "timer block" model costs ~48 bytes per timer per thread
+ * (we have about six uint64 fields per timer).  This does increase
+ * the size of the thread-local storage block, but it is allocated (at
+ * thread create time) and not on the thread stack, so I'm not worried
+ * about the size.
+ *
+ * Partial sums for each timer are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each timer are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "timer metadata" table contains the "category" and "name"
+ * fields for each timer.  This eliminates the need to include those
+ * args in the various timer APIs.
+ */
+
+/*
+ * The definition of an individual timer and used by an individual
+ * thread.
+ */
+struct tr2_timer {
+	/*
+	 * Total elapsed time for this timer in this thread in nanoseconds.
+	 */
+	uint64_t total_ns;
+
+	/*
+	 * The maximum and minimum interval values observed for this
+	 * timer in this thread.
+	 */
+	uint64_t min_ns;
+	uint64_t max_ns;
+
+	/*
+	 * The value of the clock when this timer was started in this
+	 * thread.  (Undefined when the timer is not active in this
+	 * thread.)
+	 */
+	uint64_t start_ns;
+
+	/*
+	 * Number of times that this timer has been started and stopped
+	 * in this thread.  (Recursive starts are ignored.)
+	 */
+	uint64_t interval_count;
+
+	/*
+	 * Number of nested starts on the stack in this thread.  (We
+	 * ignore recursive starts and use this to track the recursive
+	 * calls.)
+	 */
+	unsigned int recursion_count;
+};
+
+/*
+ * Metadata for a timer.
+ */
+struct tr2_timer_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this timer
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed-size block of timers to insert into
+ * thread-local storage.  This wrapper is used to avoid quirks
+ * of C and the usual need to pass an array size argument.
+ */
+struct tr2_timer_block {
+	struct tr2_timer timer[TRACE2_NUMBER_OF_TIMERS];
+};
+
+/*
+ * Private routines used by trace2.c to actually start/stop an
+ * individual timer in the current thread.
+ */
+void tr2_start_timer(enum trace2_timer_id tid);
+void tr2_stop_timer(enum trace2_timer_id tid);
+
+/*
+ * Add the current thread's timer data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_timers(void);
+
+/*
+ * Emit per-thread timer data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+/*
+ * Emit global total timer values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply);
+
+#endif /* TR2_TMR_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 8/8] trace2: add global counter mechanism
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                         ` (6 preceding siblings ...)
  2022-10-24 13:41       ` [PATCH v4 7/8] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2022-10-24 13:41       ` Jeff Hostetler via GitGitGadget
  2022-10-25 12:27       ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Derrick Stolee
  8 siblings, 0 replies; 73+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2022-10-24 13:41 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add global counters mechanism to Trace2.

The Trace2 counters mechanism adds the ability to create a set of
global counter variables and an API to increment them efficiently.
Counters can optionally report per-thread usage in addition to the sum
across all threads.

Counter events are emitted to the Trace2 logs when a thread exits and
at process exit.

Counters are an alternative to `data` and `data_json` events.

Counters are useful when you want to measure something across the life
of the process, when you don't want per-measurement events for
performance reasons, when the data does not fit conveniently within a
region, or when your control flow does not easily let you write the
final total.  For example, you might use this to report the number of
calls to unzip() or the number of de-delta steps during a checkout.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  31 ++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  89 +++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  46 +++++++++++
 trace2.c                               |  52 +++++++++++--
 trace2.h                               |  37 +++++++++
 trace2/tr2_ctr.c                       | 101 ++++++++++++++++++++++++
 trace2/tr2_ctr.h                       | 104 +++++++++++++++++++++++++
 trace2/tr2_tgt.h                       |   7 ++
 trace2/tr2_tgt_event.c                 |  19 +++++
 trace2/tr2_tgt_normal.c                |  16 ++++
 trace2/tr2_tgt_perf.c                  |  17 ++++
 trace2/tr2_tls.h                       |   4 +
 13 files changed, 517 insertions(+), 7 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 75ce6f45603..de5fc250595 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -805,6 +805,37 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"th_counter"`::
+	This event logs the value of a counter variable in a thread.
+	This event is generated when a thread exits for counters that
+	requested per-thread events.
++
+------------
+{
+	"event":"th_counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+`"counter"`::
+	This event logs the value of a counter variable across all threads.
+	This event is generated when the process exits.  The total value
+	reported here is the sum across all threads.
++
+------------
+{
+	"event":"counter",
+	...
+	"category":"my_category",
+	"name":"my_counter",
+	"count":23
+}
+------------
+
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/Makefile b/Makefile
index 820649bf62a..29ab417ca3a 100644
--- a/Makefile
+++ b/Makefile
@@ -1094,6 +1094,7 @@ LIB_OBJS += trace.o
 LIB_OBJS += trace2.o
 LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
+LIB_OBJS += trace2/tr2_ctr.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
 LIB_OBJS += trace2/tr2_sysenv.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f951b9e97d7..1b092c60714 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -323,6 +323,92 @@ static int ut_101timer(int argc, const char **argv)
 	return 0;
 }
 
+/*
+ * Single-threaded counter test.  Add several values to the TEST1 counter.
+ * The test script can verify that the final sum is reported in the "counter"
+ * event.
+ */
+static int ut_200counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> [<v2> [...]]";
+	int value;
+	int k;
+
+	if (argc < 1)
+		die("%s", usage_error);
+
+	for (k = 0; k < argc; k++) {
+		if (get_i(&value, argv[k]))
+			die("invalid value[%s] -- %s",
+			    argv[k], usage_error);
+		trace2_counter_add(TRACE2_COUNTER_ID_TEST1, value);
+	}
+
+	return 0;
+}
+
+/*
+ * Multi-threaded counter test.  Create seveal threads that each increment
+ * the TEST2 global counter.  The test script can verify that an individual
+ * "th_counter" event is generated with a partial sum for each thread and
+ * that a final aggregate "counter" event is generated.
+ */
+
+struct ut_201_data {
+	int v1;
+	int v2;
+};
+
+static void *ut_201counter_thread_proc(void *_ut_201_data)
+{
+	struct ut_201_data *data = _ut_201_data;
+
+	trace2_thread_start("ut_201");
+
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v1);
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v2);
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+static int ut_201counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> <v2> <threads>";
+
+	struct ut_201_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.v1, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.v2, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_201counter_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -346,6 +432,9 @@ static struct unit_test ut_table[] = {
 
 	{ ut_100timer,    "100timer",  "<count> <ms_delay>" },
 	{ ut_101timer,    "101timer",  "<count> <ms_delay> <threads>" },
+
+	{ ut_200counter,  "200counter", "<v1> [<v2> [<v3> [...]]]" },
+	{ ut_201counter,  "201counter", "<v1> <v2> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 5c28424e657..0b3436e8cac 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -222,4 +222,50 @@ test_expect_success 'stopwatch timer test/test2' '
 	have_timer_event "main" "timer" "test" "test2" 15 actual
 '
 
+# Exercise the global counters and confirm that we get the expected values.
+#
+# The counter "test/test1" should only emit a global summary "counter" event.
+# The counter "test/test2" could emit per-thread "th_counter" events and a
+# global summary "counter" event.
+
+have_counter_event () {
+	thread=$1 event=$2 category=$3 name=$4 value=$5 file=$6 &&
+
+	pattern="d0|${thread}|${event}||||${category}|name:${name} value:${value}" &&
+
+	grep "${patern}" ${file}
+}
+
+test_expect_success 'global counter test/test1' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Use the counter "test1" and add n integers.
+	test-tool trace2 200counter 1 2 3 4 5 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_counter_event "main" "counter" "test" "test1" 15 actual
+'
+
+test_expect_success 'global counter test/test2' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+
+	# Add 2 integers to the counter "test2" in each of 3 threads.
+	test-tool trace2 201counter 7 13 3 &&
+
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	# So we should have 3 per-thread events of 5 each.
+	have_counter_event "th01:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th02:ut_201" "th_counter" "test" "test2" 20 actual &&
+	have_counter_event "th03:ut_201" "th_counter" "test" "test2" 20 actual &&
+
+	# And we should have a single event with the total across all threads.
+	have_counter_event "main" "counter" "test" "test2" 60 actual
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index a93cab7c2b7..279bddf53b4 100644
--- a/trace2.c
+++ b/trace2.c
@@ -8,6 +8,7 @@
 #include "version.h"
 #include "trace2/tr2_cfg.h"
 #include "trace2/tr2_cmd_name.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
 #include "trace2/tr2_sysenv.h"
@@ -101,6 +102,22 @@ static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta,
 			tgt_j->pfn_timer(meta, timer, is_final_data);
 }
 
+/*
+ * The signature of this function must match the pfn_counter
+ * method in the targets.
+ */
+static void tr2_tgt_emit_a_counter(const struct tr2_counter_metadata *meta,
+				   const struct tr2_counter *counter,
+				   int is_final_data)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tgt_j->pfn_counter(meta, counter, is_final_data);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -132,20 +149,26 @@ static void tr2main_atexit_handler(void)
 	 * Some timers want per-thread details.  If the main thread
 	 * used one of those timers, emit the details now (before
 	 * we emit the aggregate timer values).
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data for the main thread to the final
-	 * totals.  And then emit the final timer values.
+	 * Add stopwatch timer and counter data for the main thread to
+	 * the final totals.  And then emit the final values.
 	 *
 	 * Technically, we shouldn't need to hold the lock to update
-	 * and output the final_timer_block (since all other threads
-	 * should be dead by now), but it doesn't hurt anything.
+	 * and output the final_timer_block and final_counter_block
+	 * (since all other threads should be dead by now), but it
+	 * doesn't hurt anything.
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2_emit_final_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_final_counters(tr2_tgt_emit_a_counter);
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -582,16 +605,20 @@ void trace2_thread_exit_fl(const char *file, int line)
 	/*
 	 * Some timers want per-thread details.  If this thread used
 	 * one of those timers, emit the details now.
+	 *
+	 * Likewise for counters.
 	 */
 	tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer);
+	tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter);
 
 	/*
-	 * Add stopwatch timer data from the current (non-main) thread
-	 * to the final totals.  (We'll accumulate data for the main
-	 * thread later during "atexit".)
+	 * Add stopwatch timer and counter data from the current
+	 * (non-main) thread to the final totals.  (We'll accumulate
+	 * data for the main thread later during "atexit".)
 	 */
 	tr2tls_lock();
 	tr2_update_final_timers();
+	tr2_update_final_counters();
 	tr2tls_unlock();
 
 	for_each_wanted_builtin (j, tgt_j)
@@ -870,6 +897,17 @@ void trace2_timer_stop(enum trace2_timer_id tid)
 	tr2_stop_timer(tid);
 }
 
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS)
+		BUG("trace2_counter_add: invalid counter id: %d", cid);
+
+	tr2_counter_increment(cid, value);
+}
+
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index 7a843ac0518..4ced30c0db3 100644
--- a/trace2.h
+++ b/trace2.h
@@ -52,6 +52,7 @@ struct json_writer;
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
  * [] trace2_timer*     -- stopwatch timers (messages are deferred).
+ * [] trace2_counter*   -- global counters (messages are deferred).
  */
 
 /*
@@ -528,6 +529,42 @@ enum trace2_timer_id {
 void trace2_timer_start(enum trace2_timer_id tid);
 void trace2_timer_stop(enum trace2_timer_id tid);
 
+/*
+ * Define the set of global counters.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum be also be added to the
+ * `tr2_counter_metadata[]` in `trace2/tr2_tr2_ctr.c`.
+ */
+enum trace2_counter_id {
+	/*
+	 * Define two counters for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
+
+	/* Add additional counter definitions before here. */
+	TRACE2_NUMBER_OF_COUNTERS
+};
+
+/*
+ * Increase the named global counter by value.
+ *
+ * Note that this adds `value` to the current thread's partial sum for
+ * this counter (without locking) and that the complete sum is not
+ * available until all threads have exited, so it does not return the
+ * new value of the counter.
+ */
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value);
+
 /*
  * Optional platform-specific code to dump information about the
  * current and any parent process(es).  This is intended to allow
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
new file mode 100644
index 00000000000..483ca7c308f
--- /dev/null
+++ b/trace2/tr2_ctr.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tgt.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_ctr.h"
+
+/*
+ * A global counter block to aggregrate values from the partial sums
+ * from each thread.
+ */
+static struct tr2_counter_block final_counter_block; /* access under tr2tls_mutex */
+
+/*
+ * Define metadata for each global counter.
+ *
+ * This array must match the "enum trace2_counter_id" and the values
+ * in "struct tr2_counter_block.counter[*]".
+ */
+static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTERS] = {
+	[TRACE2_COUNTER_ID_TEST1] = {
+		.category = "test",
+		.name = "test1",
+		.want_per_thread_events = 0,
+	},
+	[TRACE2_COUNTER_ID_TEST2] = {
+		.category = "test",
+		.name = "test2",
+		.want_per_thread_events = 1,
+	},
+
+	/* Add additional metadata before here. */
+};
+
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+	c->value += value;
+
+	ctx->used_any_counter = 1;
+	if (tr2_counter_metadata[cid].want_per_thread_events)
+		ctx->used_any_per_thread_counter = 1;
+}
+
+void tr2_update_final_counters(void)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_counter)
+		return;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		struct tr2_counter *c_final = &final_counter_block.counter[cid];
+		const struct tr2_counter *c = &ctx->counter_block.counter[cid];
+
+		c_final->value += c->value;
+	}
+}
+
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	enum trace2_counter_id cid;
+
+	if (!ctx->used_any_per_thread_counter)
+		return;
+
+	/*
+	 * For each counter, if the counter wants per-thread events
+	 * and this thread used it (the value is non-zero), emit it.
+	 */
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (tr2_counter_metadata[cid].want_per_thread_events &&
+		    ctx->counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &ctx->counter_block.counter[cid],
+				 0);
+}
+
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply)
+{
+	enum trace2_counter_id cid;
+
+	/*
+	 * Access `final_counter_block` requires holding `tr2tls_mutex`.
+	 * We assume that our caller is holding the lock.
+	 */
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++)
+		if (final_counter_block.counter[cid].value)
+			fn_apply(&tr2_counter_metadata[cid],
+				 &final_counter_block.counter[cid],
+				 1);
+}
diff --git a/trace2/tr2_ctr.h b/trace2/tr2_ctr.h
new file mode 100644
index 00000000000..a2267ee9901
--- /dev/null
+++ b/trace2/tr2_ctr.h
@@ -0,0 +1,104 @@
+#ifndef TR2_CTR_H
+#define TR2_CTR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow global "counters".
+ *
+ * Counters can be used count interesting activity that does not fit
+ * the "region and data" model, such as code called from many
+ * different regions and/or where you want to count a number of items,
+ * but don't have control of when the last item will be processed,
+ * such as counter the number of calls to `lstat()`.
+ *
+ * Counters differ from Trace2 "data" events.  Data events are emitted
+ * immediately and are appropriate for documenting loop counters at
+ * the end of a region, for example.  Counter values are accumulated
+ * during the program and final counter values are emitted at program
+ * exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * counters and counter ids using a fixed size "counter block" array
+ * in thread-local storage.  This gives us constant time, lock-free
+ * access to each counter within each thread.  This lets us avoid the
+ * complexities of dynamically allocating a counter and sharing that
+ * definition with other threads.
+ *
+ * Each thread uses the counter block in its thread-local storage to
+ * increment partial sums for each counter (without locking).  When a
+ * thread exits, those partial sums are (under lock) added to the
+ * global final sum.
+ *
+ * Partial sums for each counter are optionally emitted when a thread
+ * exits.
+ *
+ * Final sums for each counter are emitted between the "exit" and
+ * "atexit" events.
+ *
+ * A parallel "counter metadata" table contains the "category" and
+ * "name" fields for each counter.  This eliminates the need to
+ * include those args in the various counter APIs.
+ */
+
+/*
+ * The definition of an individual counter as used by an individual
+ * thread (and later in aggregation).
+ */
+struct tr2_counter {
+	uint64_t value;
+};
+
+/*
+ * Metadata for a counter.
+ */
+struct tr2_counter_metadata {
+	const char *category;
+	const char *name;
+
+	/*
+	 * True if we should emit per-thread events for this counter
+	 * when individual threads exit.
+	 */
+	unsigned int want_per_thread_events:1;
+};
+
+/*
+ * A compile-time fixed block of counters to insert into thread-local
+ * storage.  This wrapper is used to avoid quirks of C and the usual
+ * need to pass an array size argument.
+ */
+struct tr2_counter_block {
+	struct tr2_counter counter[TRACE2_NUMBER_OF_COUNTERS];
+};
+
+/*
+ * Private routines used by trace2.c to increment a counter for the
+ * current thread.
+ */
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value);
+
+/*
+ * Add the current thread's counter data to the global totals.
+ * This is called during thread-exit.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_update_final_counters(void);
+
+/*
+ * Emit per-thread counter data for the current thread.
+ * This is called during thread-exit.
+ */
+void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+/*
+ * Emit global counter values.
+ * This is called during atexit handling.
+ *
+ * Caller must be holding the tr2tls_mutex.
+ */
+void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply);
+
+#endif /* TR2_CTR_H */
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 85c8d2d7f5a..bf8745c4f05 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -6,6 +6,8 @@ struct repository;
 struct json_writer;
 struct tr2_timer_metadata;
 struct tr2_timer;
+struct tr2_counter_metadata;
+struct tr2_counter;
 
 #define NS_TO_SEC(ns) ((double)(ns) / 1.0e9)
 
@@ -104,6 +106,10 @@ typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta,
 				  const struct tr2_timer *timer,
 				  int is_final_data);
 
+typedef void(tr2_tgt_evt_counter_t)(const struct tr2_counter_metadata *meta,
+				    const struct tr2_counter *counter,
+				    int is_final_data);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -141,6 +147,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
 	tr2_tgt_evt_timer_t                     *pfn_timer;
+	tr2_tgt_evt_counter_t                   *pfn_counter;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index af5a8edb474..16f6332755e 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -642,6 +642,24 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	jw_release(&jw);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	jw_object_string(&jw, "category", meta->category);
+	jw_object_string(&jw, "name", meta->name);
+	jw_object_intmax(&jw, "count", counter->value);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	.pdst = &tr2dst_event,
 
@@ -674,4 +692,5 @@ struct tr2_tgt tr2_tgt_event = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = NULL,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index b079baf1002..fbbef68dfc0 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -351,6 +351,21 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "%s %s/%s value:%"PRIu64,
+		    event_name, meta->category, meta->name,
+		    counter->value);
+
+	normal_io_write_fl(__FILE__, __LINE__, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_normal = {
 	.pdst = &tr2dst_normal,
 
@@ -383,4 +398,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	.pfn_data_json_fl = NULL,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index e69375e9799..adae8032639 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -578,6 +578,22 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(const struct tr2_counter_metadata *meta,
+		       const struct tr2_counter *counter,
+		       int is_final_data)
+{
+	const char *event_name = is_final_data ? "counter" : "th_counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "name:%s value:%"PRIu64,
+		    meta->name,
+		    counter->value);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL,
+			 meta->category, &buf_payload);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	.pdst = &tr2dst_perf,
 
@@ -610,4 +626,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	.pfn_data_json_fl = fn_data_json_fl,
 	.pfn_printf_va_fl = fn_printf_va_fl,
 	.pfn_timer = fn_timer,
+	.pfn_counter = fn_counter,
 };
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index a064b66e4cc..f9049805d4d 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tmr.h"
 
 /*
@@ -22,8 +23,11 @@ struct tr2tls_thread_ctx {
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 	struct tr2_timer_block timer_block;
+	struct tr2_counter_block counter_block;
 	unsigned int used_any_timer:1;
 	unsigned int used_any_per_thread_timer:1;
+	unsigned int used_any_counter:1;
+	unsigned int used_any_per_thread_counter:1;
 };
 
 /*
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2022-10-24 13:41       ` [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2022-10-24 20:31         ` Junio C Hamano
  2022-10-25 12:35           ` Derrick Stolee
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2022-10-24 20:31 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Jeff Hostetler

As I do not see a cover letter for this series, here is the summary
of the change since the previous round that has been in 'seen'.

I didn't see anything questionable in these.

Thanks, will queue.

 trace2/tr2_tgt.h        | 2 +-
 trace2/tr2_tgt_event.c  | 6 +++---
 trace2/tr2_tgt_normal.c | 6 +++---
 trace2/tr2_tgt_perf.c   | 6 +++---
 trace2/tr2_tls.h        | 2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git c/trace2/tr2_tgt.h w/trace2/tr2_tgt.h
index 95f4c75472..bf8745c4f0 100644
--- c/trace2/tr2_tgt.h
+++ w/trace2/tr2_tgt.h
@@ -9,7 +9,7 @@ struct tr2_timer;
 struct tr2_counter_metadata;
 struct tr2_counter;
 
-#define NS_PER_SEC_D ((double)1000*1000*1000)
+#define NS_TO_SEC(ns) ((double)(ns) / 1.0e9)
 
 /*
  * Function prototypes for a TRACE2 "target" vtable.
diff --git c/trace2/tr2_tgt_event.c w/trace2/tr2_tgt_event.c
index 981863a660..16f6332755 100644
--- c/trace2/tr2_tgt_event.c
+++ w/trace2/tr2_tgt_event.c
@@ -624,9 +624,9 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 {
 	const char *event_name = is_final_data ? "timer" : "th_timer";
 	struct json_writer jw = JSON_WRITER_INIT;
-	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
-	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
-	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
+	double t_total = NS_TO_SEC(timer->total_ns);
+	double t_min = NS_TO_SEC(timer->min_ns);
+	double t_max = NS_TO_SEC(timer->max_ns);
 
 	jw_object_begin(&jw, 0);
 	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
diff --git c/trace2/tr2_tgt_normal.c w/trace2/tr2_tgt_normal.c
index def18674e8..fbbef68dfc 100644
--- c/trace2/tr2_tgt_normal.c
+++ w/trace2/tr2_tgt_normal.c
@@ -336,9 +336,9 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 {
 	const char *event_name = is_final_data ? "timer" : "th_timer";
 	struct strbuf buf_payload = STRBUF_INIT;
-	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
-	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
-	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
+	double t_total = NS_TO_SEC(timer->total_ns);
+	double t_min = NS_TO_SEC(timer->min_ns);
+	double t_max = NS_TO_SEC(timer->max_ns);
 
 	strbuf_addf(&buf_payload, ("%s %s/%s"
 				   " intervals:%"PRIu64
diff --git c/trace2/tr2_tgt_perf.c w/trace2/tr2_tgt_perf.c
index db94b2ef47..adae803263 100644
--- c/trace2/tr2_tgt_perf.c
+++ w/trace2/tr2_tgt_perf.c
@@ -562,9 +562,9 @@ static void fn_timer(const struct tr2_timer_metadata *meta,
 {
 	const char *event_name = is_final_data ? "timer" : "th_timer";
 	struct strbuf buf_payload = STRBUF_INIT;
-	double t_total = ((double)timer->total_ns) / NS_PER_SEC_D;
-	double t_min = ((double)timer->min_ns) / NS_PER_SEC_D;
-	double t_max = ((double)timer->max_ns) / NS_PER_SEC_D;
+	double t_total = NS_TO_SEC(timer->total_ns);
+	double t_min = NS_TO_SEC(timer->min_ns);
+	double t_max = NS_TO_SEC(timer->max_ns);
 
 	strbuf_addf(&buf_payload, ("name:%s"
 				   " intervals:%"PRIu64
diff --git c/trace2/tr2_tls.h w/trace2/tr2_tls.h
index 289b62d072..f9049805d4 100644
--- c/trace2/tr2_tls.h
+++ w/trace2/tr2_tls.h
@@ -38,7 +38,7 @@ struct tr2tls_thread_ctx {
  * Subsequent threads are given a non-zero thread_id and a thread_name
  * constructed from the id and a thread base name (which is usually just
  * the name of the thread-proc function).  For example:
- *     { .thread_id=10, .thread_name="th10fsm-listen" }
+ *     { .thread_id=10, .thread_name="th10:fsm-listen" }
  * This helps to identify and distinguish messages from concurrent threads.
  * The ctx.thread_name field is truncated if necessary to help with column
  * alignment in printf-style messages.

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 0/8] Trace2 timers and counters and some cleanup
  2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
                         ` (7 preceding siblings ...)
  2022-10-24 13:41       ` [PATCH v4 8/8] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
@ 2022-10-25 12:27       ` Derrick Stolee
  2022-10-25 15:36         ` Junio C Hamano
  8 siblings, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2022-10-25 12:27 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget, git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Jeff Hostetler

On 10/24/2022 9:40 AM, Jeff Hostetler via GitGitGadget wrote:
> Here is version 4 of this series to add timers and counters to Trace2.
> 
> Changes since V3:
> 
>  * Fixed typo in the new thread-name documentation.
>  * Use a simpler NS_TO_SEC() macro for reporting the timer values.
> 
> Jeff Hostetler (8):
>   trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
>   tr2tls: clarify TLS terminology
>   api-trace2.txt: elminate section describing the public trace2 API
>   trace2: rename the thread_name argument to trace2_thread_start
>   trace2: improve thread-name documentation in the thread-context
>   trace2: convert ctx.thread_name from strbuf to pointer
>   trace2: add stopwatch timers
>   trace2: add global counter mechanism

I re-read the series as well as looked at the range-diffs for the
previous two versions. I continue to think this is a high-quality
series and I've used it multiple times in my personal development
workflow to investigate certain performance things. I'm looking
forward to this being merged so we can all use it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2022-10-24 20:31         ` Junio C Hamano
@ 2022-10-25 12:35           ` Derrick Stolee
  2022-10-25 15:40             ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2022-10-25 12:35 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Jeff Hostetler

On 10/24/2022 4:31 PM, Junio C Hamano wrote:
> As I do not see a cover letter for this series, here is the summary
> of the change since the previous round that has been in 'seen'.
> 
> I didn't see anything questionable in these.
> 
> Thanks, will queue.

The cover letter appears on my end, but I'm on the CC list.

Jeff: be sure to CC Junio by adding him to the CC list on your
PR description for anything you want to have considered for
queuing.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 0/8] Trace2 timers and counters and some cleanup
  2022-10-25 12:27       ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Derrick Stolee
@ 2022-10-25 15:36         ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2022-10-25 15:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff Hostetler via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Jeff Hostetler

Derrick Stolee <derrickstolee@github.com> writes:

> On 10/24/2022 9:40 AM, Jeff Hostetler via GitGitGadget wrote:
>> Here is version 4 of this series to add timers and counters to Trace2.
>> 
>> Changes since V3:
>> 
>>  * Fixed typo in the new thread-name documentation.
>>  * Use a simpler NS_TO_SEC() macro for reporting the timer values.
>> 
>> Jeff Hostetler (8):
>>   trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
>>   tr2tls: clarify TLS terminology
>>   api-trace2.txt: elminate section describing the public trace2 API
>>   trace2: rename the thread_name argument to trace2_thread_start
>>   trace2: improve thread-name documentation in the thread-context
>>   trace2: convert ctx.thread_name from strbuf to pointer
>>   trace2: add stopwatch timers
>>   trace2: add global counter mechanism
>
> I re-read the series as well as looked at the range-diffs for the
> previous two versions. I continue to think this is a high-quality
> series and I've used it multiple times in my personal development
> workflow to investigate certain performance things. I'm looking
> forward to this being merged so we can all use it.

I agree with your assessment.  Let's move it forward.

Thanks, all.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2022-10-25 12:35           ` Derrick Stolee
@ 2022-10-25 15:40             ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2022-10-25 15:40 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff Hostetler via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Jeff Hostetler

Derrick Stolee <derrickstolee@github.com> writes:

> On 10/24/2022 4:31 PM, Junio C Hamano wrote:
>> As I do not see a cover letter for this series, here is the summary
>> of the change since the previous round that has been in 'seen'.
>> 
>> I didn't see anything questionable in these.
>> 
>> Thanks, will queue.
>
> The cover letter appears on my end, but I'm on the CC list.

Yeah, it seems vger was a bit constipated yesterday.  Everything
came through at the end, and I am happy with the series.

> Jeff: be sure to CC Junio by adding him to the CC list on your
> PR description for anything you want to have considered for
> queuing.

Everybody wants their non RFC patches to have considered for
queuing, but vger is not that lossy ;-)


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2022-10-25 15:40 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-04 16:19 [PATCH 0/9] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
2022-10-04 16:19 ` [PATCH 1/9] builtin/merge-file: fix compiler warning on MacOS with clang 11.0.0 Jeff Hostetler via GitGitGadget
2022-10-04 16:20 ` [PATCH 2/9] builtin/unpack-objects.c: " Jeff Hostetler via GitGitGadget
2022-10-04 16:20 ` [PATCH 3/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2022-10-04 16:20 ` [PATCH 4/9] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
2022-10-04 16:20 ` [PATCH 5/9] trace2: rename trace2 thread_name argument as name_hint Jeff Hostetler via GitGitGadget
2022-10-04 16:20 ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler via GitGitGadget
2022-10-05 11:14   ` Ævar Arnfjörð Bjarmason
2022-10-06 16:28     ` Jeff Hostetler
2022-10-10 18:31     ` Jeff Hostetler
2022-10-05 18:03   ` Junio C Hamano
2022-10-06 21:05     ` Ævar Arnfjörð Bjarmason
2022-10-06 21:50       ` Junio C Hamano
2022-10-07  1:10         ` [RFC PATCH] trace2 API: don't save a copy of constant "thread_name" Ævar Arnfjörð Bjarmason
2022-10-07  1:16           ` Junio C Hamano
2022-10-07 10:03             ` Ævar Arnfjörð Bjarmason
2022-10-10 19:16               ` Jeff Hostetler
2022-10-11 13:31                 ` Ævar Arnfjörð Bjarmason
2022-10-12 13:31                   ` Jeff Hostetler
2022-10-10 19:05           ` Jeff Hostetler
2022-10-11 12:52             ` Ævar Arnfjörð Bjarmason
2022-10-11 14:40               ` Junio C Hamano
2022-10-10 18:39       ` [PATCH 6/9] trace2: convert ctx.thread_name to flex array Jeff Hostetler
2022-10-04 16:20 ` [PATCH 7/9] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
2022-10-04 16:20 ` [PATCH 8/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2022-10-04 16:20 ` [PATCH 9/9] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
2022-10-05 13:04 ` [PATCH 0/9] Trace2 timers and counters and some cleanup Ævar Arnfjörð Bjarmason
2022-10-06 15:45   ` Jeff Hostetler
2022-10-06 18:12 ` Derrick Stolee
2022-10-12 18:52 ` [PATCH v2 0/7] " Jeff Hostetler via GitGitGadget
2022-10-12 18:52   ` [PATCH v2 1/7] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2022-10-12 18:52   ` [PATCH v2 2/7] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
2022-10-13 21:12     ` Junio C Hamano
2022-10-12 18:52   ` [PATCH v2 3/7] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
2022-10-12 18:52   ` [PATCH v2 4/7] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
2022-10-12 21:06     ` Ævar Arnfjörð Bjarmason
2022-10-20 14:40       ` Jeff Hostetler
2022-10-13 21:12     ` Junio C Hamano
2022-10-12 18:52   ` [PATCH v2 5/7] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
2022-10-13 21:12     ` Junio C Hamano
2022-10-12 18:52   ` [PATCH v2 6/7] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2022-10-13 21:12     ` Junio C Hamano
2022-10-20 14:42       ` Jeff Hostetler
2022-10-12 18:52   ` [PATCH v2 7/7] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
2022-10-20 18:28   ` [PATCH v3 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
2022-10-20 18:28     ` [PATCH v3 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2022-10-20 18:28     ` [PATCH v3 2/8] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
2022-10-20 18:28     ` [PATCH v3 3/8] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
2022-10-20 18:28     ` [PATCH v3 4/8] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
2022-10-20 18:28     ` [PATCH v3 5/8] trace2: improve thread-name documentation in the thread-context Jeff Hostetler via GitGitGadget
2022-10-20 18:57       ` Ævar Arnfjörð Bjarmason
2022-10-20 20:15         ` Jeff Hostetler
2022-10-20 18:28     ` [PATCH v3 6/8] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
2022-10-20 18:28     ` [PATCH v3 7/8] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2022-10-20 20:25       ` Junio C Hamano
2022-10-20 20:52         ` Jeff Hostetler
2022-10-20 20:55           ` Junio C Hamano
2022-10-21 21:51             ` Jeff Hostetler
2022-10-20 18:28     ` [PATCH v3 8/8] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
2022-10-24 13:40     ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Jeff Hostetler via GitGitGadget
2022-10-24 13:41       ` [PATCH v4 1/8] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2022-10-24 20:31         ` Junio C Hamano
2022-10-25 12:35           ` Derrick Stolee
2022-10-25 15:40             ` Junio C Hamano
2022-10-24 13:41       ` [PATCH v4 2/8] tr2tls: clarify TLS terminology Jeff Hostetler via GitGitGadget
2022-10-24 13:41       ` [PATCH v4 3/8] api-trace2.txt: elminate section describing the public trace2 API Jeff Hostetler via GitGitGadget
2022-10-24 13:41       ` [PATCH v4 4/8] trace2: rename the thread_name argument to trace2_thread_start Jeff Hostetler via GitGitGadget
2022-10-24 13:41       ` [PATCH v4 5/8] trace2: improve thread-name documentation in the thread-context Jeff Hostetler via GitGitGadget
2022-10-24 13:41       ` [PATCH v4 6/8] trace2: convert ctx.thread_name from strbuf to pointer Jeff Hostetler via GitGitGadget
2022-10-24 13:41       ` [PATCH v4 7/8] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2022-10-24 13:41       ` [PATCH v4 8/8] trace2: add global counter mechanism Jeff Hostetler via GitGitGadget
2022-10-25 12:27       ` [PATCH v4 0/8] Trace2 timers and counters and some cleanup Derrick Stolee
2022-10-25 15:36         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).