git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH] trace2: don't overload target directories
@ 2019-07-29 22:20 Josh Steadmon
  2019-07-30 13:29 ` Derrick Stolee
                   ` (6 more replies)
  0 siblings, 7 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-07-29 22:20 UTC (permalink / raw)
  To: git

trace2 can write files into a target directory. With heavy usage, this
directory can fill up with files, causing difficulty for
trace-processing systems.

When trace2 would write a file to a target directory, first check
whether or not the directory is overloaded. A directory is overloaded if
there is a sentinel file declaring an overload, or if the number of
files exceeds a threshold. If the latter, create a sentinel file to
speed up later overload checks.

The file count threshold is currently set to 1M files, but this can be
overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT.

The assumption is that a separate trace-processing system is dealing
with the generated traces; once it processes and removes the sentinel
file, it should be safe to generate new trace files again.

Potential future work:
* Write a message into the sentinel file (should match the requested
  trace2 output format).
* Make the overload threshold (and the whole overload feature)
  configurable.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 t/t0210-trace2-normal.sh | 15 ++++++++
 trace2/tr2_dst.c         | 81 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index ce7574edb1..e8a03e9212 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -186,4 +186,19 @@ test_expect_success 'using global config with include' '
 	test_cmp expect actual
 '
 
+test_expect_success "don't overload target directory" '
+	GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 &&
+	export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT &&
+	test_when_finished "rm -r trace_target_dir" &&
+	mkdir trace_target_dir &&
+	test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt &&
+	xargs touch < expected_filenames.txt &&
+	ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt &&
+	test_cmp expected_filenames.txt first_ls_output.txt &&
+	GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 &&
+	echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt &&
+	ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt &&
+	test_cmp expected_filenames.txt second_ls_output.txt
+'
+
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 5dda0ca1cd..3286297918 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,3 +1,5 @@
+#include <dirent.h>
+
 #include "cache.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
@@ -8,6 +10,18 @@
  */
 #define MAX_AUTO_ATTEMPTS 10
 
+/*
+ * Sentinel file used to detect when we're overloading a directory with too many
+ * trace files.
+ */
+#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
+
+/*
+ * How many files we can write to a directory before entering overload mode.
+ * This can be overridden with the envvar GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT
+ */
+#define OVERLOAD_FILE_COUNT 1000000
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -32,6 +46,63 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+/*
+ * Check to make sure we're not overloading the target directory with too many
+ * files. First check for the presence of a sentinel file, then check file
+ * count. If we are overloaded, create the sentinel file if it doesn't already
+ * exist.
+ *
+ * We expect that some trace processing system is gradually collecting files
+ * from the target directory; after it removes the sentinel file we'll start
+ * writing traces again.
+ */
+static int tr2_dst_overloaded(const char *tgt_prefix)
+{
+	int file_count = 0, overload_file_count = 0;
+	char *test_threshold_val;
+	DIR *dirp;
+	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
+	struct stat statbuf;
+
+	strbuf_addstr(&path, tgt_prefix);
+	if (!is_dir_sep(path.buf[path.len - 1])) {
+		strbuf_addch(&path, '/');
+	}
+
+	/* check sentinel */
+	strbuf_addstr(&sentinel_path, path.buf);
+	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
+	if (!stat(sentinel_path.buf, &statbuf)) {
+		strbuf_release(&path);
+		return 1;
+	}
+
+	/* check if we're overriding the threshold (e.g., for testing) */
+	test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT");
+	if (test_threshold_val)
+		overload_file_count = atoi(test_threshold_val);
+	if (overload_file_count <= 0)
+		overload_file_count = OVERLOAD_FILE_COUNT;
+
+
+	/* check file count */
+	dirp = opendir(path.buf);
+	while (file_count < overload_file_count && dirp && readdir(dirp))
+		file_count++;
+	if (dirp)
+		closedir(dirp);
+
+	if (file_count >= overload_file_count) {
+		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
+		/* TODO: Write a target-specific message? */
+		strbuf_release(&path);
+		return 1;
+	}
+
+	strbuf_release(&path);
+	return 0;
+}
+
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int fd;
@@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
+	if (tr2_dst_overloaded(tgt_prefix)) {
+		strbuf_release(&path);
+		if (tr2_dst_want_warning())
+			warning("trace2: not opening %s trace file due to too "
+				"many files in target directory %s",
+				tr2_sysenv_display_name(dst->sysenv_var),
+				tgt_prefix);
+		return 0;
+	}
+
 	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
 		if (attempt_count > 0) {
 			strbuf_setlen(&path, base_path_len);
-- 
2.22.0.709.g102302147b-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] trace2: don't overload target directories
  2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
@ 2019-07-30 13:29 ` Derrick Stolee
  2019-07-30 21:52   ` Josh Steadmon
  2019-07-30 16:46 ` Jeff Hostetler
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 40+ messages in thread
From: Derrick Stolee @ 2019-07-30 13:29 UTC (permalink / raw)
  To: Josh Steadmon, git

On 7/29/2019 6:20 PM, Josh Steadmon wrote:
> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.
> 
> When trace2 would write a file to a target directory, first check
> whether or not the directory is overloaded. A directory is overloaded if
> there is a sentinel file declaring an overload, or if the number of
> files exceeds a threshold. If the latter, create a sentinel file to
> speed up later overload checks.
> 
> The file count threshold is currently set to 1M files, but this can be
> overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT.

1 million seems like a LOT, and the environment variable seems to be only
for testing.

* If the variable is only for testing, then it should start with GIT_TEST_

* Are we sure 1 million is the right number? I would imagine even 10,000
  starting to be a problem. How would a user adjust this value if they
  are having problems before 1,000,000?

> The assumption is that a separate trace-processing system is dealing
> with the generated traces; once it processes and removes the sentinel
> file, it should be safe to generate new trace files again.

This matches the model that you (Google) are using for collecting logs.
I'll trust your expertise here in how backed up these logs become. I
imagine that someone working without a network connection for a long
time would be likely to run into this problem.

[snip]

> +test_expect_success "don't overload target directory" '
> +	GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 &&

For testing, does this need to be 100? Could it be 5?

> +	export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT &&

To avoid leakage to other (future) tests, should these be in a subshell?

> +	test_when_finished "rm -r trace_target_dir" &&
> +	mkdir trace_target_dir &&
> +	test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt &&
> +	xargs touch < expected_filenames.txt &&

nit: no space between redirection and filename.

> +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt &&
> +	test_cmp expected_filenames.txt first_ls_output.txt &&
> +	GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 &&
> +	echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt &&
> +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt &&
> +	test_cmp expected_filenames.txt second_ls_output.txt
> +'

[snip]

> +/*
> + * Check to make sure we're not overloading the target directory with too many
> + * files. First check for the presence of a sentinel file, then check file
> + * count. If we are overloaded, create the sentinel file if it doesn't already
> + * exist.
> + *
> + * We expect that some trace processing system is gradually collecting files
> + * from the target directory; after it removes the sentinel file we'll start
> + * writing traces again.
> + */
> +static int tr2_dst_overloaded(const char *tgt_prefix)
> +{
> +	int file_count = 0, overload_file_count = 0;
> +	char *test_threshold_val;
> +	DIR *dirp;
> +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> +	struct stat statbuf;
> +
> +	strbuf_addstr(&path, tgt_prefix);
> +	if (!is_dir_sep(path.buf[path.len - 1])) {
> +		strbuf_addch(&path, '/');
> +	}
> +
> +	/* check sentinel */
> +	strbuf_addstr(&sentinel_path, path.buf);
> +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> +	if (!stat(sentinel_path.buf, &statbuf)) {
> +		strbuf_release(&path);
> +		return 1;
> +	}
> +
> +	/* check if we're overriding the threshold (e.g., for testing) */
> +	test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT");
> +	if (test_threshold_val)
> +		overload_file_count = atoi(test_threshold_val);
> +	if (overload_file_count <= 0)
> +		overload_file_count = OVERLOAD_FILE_COUNT;
> +
> +
> +	/* check file count */
> +	dirp = opendir(path.buf);
> +	while (file_count < overload_file_count && dirp && readdir(dirp))
> +		file_count++;
> +	if (dirp)
> +		closedir(dirp);
> +
> +	if (file_count >= overload_file_count) {
> +		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
> +		/* TODO: Write a target-specific message? */

Perhaps leave the TODO out of the code? I did see it in your commit message.

> +		strbuf_release(&path);
> +		return 1;
> +	}
> +
> +	strbuf_release(&path);
> +	return 0;
> +}
> +
>  static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>  {
>  	int fd;
> @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>  	strbuf_addstr(&path, sid);
>  	base_path_len = path.len;
>  
> +	if (tr2_dst_overloaded(tgt_prefix)) {
> +		strbuf_release(&path);
> +		if (tr2_dst_want_warning())
> +			warning("trace2: not opening %s trace file due to too "
> +				"many files in target directory %s",
> +				tr2_sysenv_display_name(dst->sysenv_var),
> +				tgt_prefix);
> +		return 0;
> +	}
> +
>  	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
>  		if (attempt_count > 0) {
>  			strbuf_setlen(&path, base_path_len);
> 

Overall, this looks correct and the test is very clear. Seems to be a helpful feature!

I only have the nits mentioned above.

Thanks!
-Stolee


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] trace2: don't overload target directories
  2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
  2019-07-30 13:29 ` Derrick Stolee
@ 2019-07-30 16:46 ` Jeff Hostetler
  2019-07-30 22:01   ` Josh Steadmon
  2019-07-30 22:02   ` Josh Steadmon
  2019-07-30 18:00 ` Jeff Hostetler
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: Jeff Hostetler @ 2019-07-30 16:46 UTC (permalink / raw)
  To: Josh Steadmon, git



On 7/29/2019 6:20 PM, Josh Steadmon wrote:
> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.
> 
> When trace2 would write a file to a target directory, first check
> whether or not the directory is overloaded. A directory is overloaded if
> there is a sentinel file declaring an overload, or if the number of
> files exceeds a threshold. If the latter, create a sentinel file to
> speed up later overload checks.

Something about this idea bothers me, but I can't quite put my finger
on it.  You're filling a directory with thousands of files while
(hopefully simultaneously) having a post-processor/aggregator app
read and delete them.  I understand that if the aggregator falls
behind or isn't running, the files will just accumulate and that the
total number of files is the problem.  But I have to wonder if
contention on that directory is going to be a bottleneck and/or
a source of problems.  That is, you'll have one process reading and
deleting and one or more Git processes scanning/counting/creating.
It seems like there might be opportunity for some kinds of races
here.

It have to wonder if it would be better to do some kind of directory
rotation rather than create a marker file.

Alternatively, I think it would be better to not have the marker
file be inside the directory, but rather have a lock file somewhere
to temporarily disable tracing.  Then your stat() call would not need
to effectively search the large directory.  Maybe make this
"<dirname>.lock" as a peer to "<dirname>/", for example.


> 
> The file count threshold is currently set to 1M files, but this can be
> overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT.
> 
> The assumption is that a separate trace-processing system is dealing
> with the generated traces; once it processes and removes the sentinel
> file, it should be safe to generate new trace files again.
> 
> Potential future work:
> * Write a message into the sentinel file (should match the requested
>    trace2 output format).
> * Make the overload threshold (and the whole overload feature)
>    configurable.

I'm wondering if we should just make this setting another
value in `tr2_sysenv_settings[]` rather than a *_TEST_* env var.

That would give you both env and system/global config support,
since I'm assuming you'd eventually want to have this be in
the user's global config with the other trace2 settings.

All of your tests could be expressed in terms of this new setting
and we wouldn't need this new test env var.

> 
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>   t/t0210-trace2-normal.sh | 15 ++++++++
>   trace2/tr2_dst.c         | 81 ++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 96 insertions(+)
> 
> diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> index ce7574edb1..e8a03e9212 100755
> --- a/t/t0210-trace2-normal.sh
> +++ b/t/t0210-trace2-normal.sh
> @@ -186,4 +186,19 @@ test_expect_success 'using global config with include' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success "don't overload target directory" '
> +	GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 &&
> +	export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT &&
> +	test_when_finished "rm -r trace_target_dir" &&
> +	mkdir trace_target_dir &&
> +	test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt &&
> +	xargs touch < expected_filenames.txt &&
> +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt &&
> +	test_cmp expected_filenames.txt first_ls_output.txt &&
> +	GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 &&
> +	echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt &&
> +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt &&
> +	test_cmp expected_filenames.txt second_ls_output.txt
> +'
> +
>   test_done
> diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> index 5dda0ca1cd..3286297918 100644
> --- a/trace2/tr2_dst.c
> +++ b/trace2/tr2_dst.c
> @@ -1,3 +1,5 @@
> +#include <dirent.h>
> +
>   #include "cache.h"
>   #include "trace2/tr2_dst.h"
>   #include "trace2/tr2_sid.h"
> @@ -8,6 +10,18 @@
>    */
>   #define MAX_AUTO_ATTEMPTS 10
>   
> +/*
> + * Sentinel file used to detect when we're overloading a directory with too many
> + * trace files.
> + */
> +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
> +
> +/*
> + * How many files we can write to a directory before entering overload mode.
> + * This can be overridden with the envvar GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT
> + */
> +#define OVERLOAD_FILE_COUNT 1000000
> +
>   static int tr2_dst_want_warning(void)
>   {
>   	static int tr2env_dst_debug = -1;
> @@ -32,6 +46,63 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
>   	dst->need_close = 0;
>   }
>   
> +/*
> + * Check to make sure we're not overloading the target directory with too many
> + * files. First check for the presence of a sentinel file, then check file
> + * count. If we are overloaded, create the sentinel file if it doesn't already
> + * exist.
> + *
> + * We expect that some trace processing system is gradually collecting files
> + * from the target directory; after it removes the sentinel file we'll start
> + * writing traces again.
> + */
> +static int tr2_dst_overloaded(const char *tgt_prefix)
> +{
> +	int file_count = 0, overload_file_count = 0;
> +	char *test_threshold_val;
> +	DIR *dirp;
> +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> +	struct stat statbuf;
> +
> +	strbuf_addstr(&path, tgt_prefix);
> +	if (!is_dir_sep(path.buf[path.len - 1])) {
> +		strbuf_addch(&path, '/');
> +	}
> +
> +	/* check sentinel */
> +	strbuf_addstr(&sentinel_path, path.buf);
> +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> +	if (!stat(sentinel_path.buf, &statbuf)) {
> +		strbuf_release(&path);

Also release sentinel_path ?
(And in both of the return statements below.)

> +		return 1;
> +	}
> +
> +	/* check if we're overriding the threshold (e.g., for testing) */
> +	test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT");
> +	if (test_threshold_val)
> +		overload_file_count = atoi(test_threshold_val);
> +	if (overload_file_count <= 0)
> +		overload_file_count = OVERLOAD_FILE_COUNT;
> +
> +
> +	/* check file count */
> +	dirp = opendir(path.buf);
> +	while (file_count < overload_file_count && dirp && readdir(dirp))
> +		file_count++;
> +	if (dirp)
> +		closedir(dirp);
> +
> +	if (file_count >= overload_file_count) {
> +		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
> +		/* TODO: Write a target-specific message? */
> +		strbuf_release(&path);
> +		return 1;
> +	}
> +
> +	strbuf_release(&path);
> +	return 0;
> +}
> +
>   static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>   {
>   	int fd;
> @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>   	strbuf_addstr(&path, sid);
>   	base_path_len = path.len;
>   
> +	if (tr2_dst_overloaded(tgt_prefix)) {
> +		strbuf_release(&path);
> +		if (tr2_dst_want_warning())
> +			warning("trace2: not opening %s trace file due to too "
> +				"many files in target directory %s",
> +				tr2_sysenv_display_name(dst->sysenv_var),
> +				tgt_prefix);
> +		return 0;
> +	}
> +
>   	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
>   		if (attempt_count > 0) {
>   			strbuf_setlen(&path, base_path_len);
> 

hope this helps,
Jeff


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] trace2: don't overload target directories
  2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
  2019-07-30 13:29 ` Derrick Stolee
  2019-07-30 16:46 ` Jeff Hostetler
@ 2019-07-30 18:00 ` Jeff Hostetler
  2019-07-30 22:08   ` Josh Steadmon
  2019-08-02 22:02 ` [RFC PATCH v2 0/2] " Josh Steadmon
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 40+ messages in thread
From: Jeff Hostetler @ 2019-07-30 18:00 UTC (permalink / raw)
  To: Josh Steadmon, git



On 7/29/2019 6:20 PM, Josh Steadmon wrote:
> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.

I'm routing data in my org to a daemon via a Named Pipe or UD Socket,
so I'm not seeing the thousands of files problems that you're seeing.
However, we were being overwhelmed with lots of "uninteresting" commands
and so I added some whitelisting to my post-processing daemon.  For
example, I want to know about checkout and push times -- I really don't
care about rev-parse or config times or other such minor commands.

I went one step further and allow either "(cmd_name)" or
the pair "(cmd_name, cmd_mode)".  This lets me select all checkouts
and limit checkouts to branch-changing ones, for example.  I drop
any events in my post-processor that does not match any of my whitelist
patterns.

Perhaps you could run a quick histogram and see if something would
be useful to pre-filter the data.  That is, if we had whitelisting
within git.exe itself, would you still have too much data and/or
would you still need the overload feature that you've proposed in
this RFC?

Thanks,
Jeff


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] trace2: don't overload target directories
  2019-07-30 13:29 ` Derrick Stolee
@ 2019-07-30 21:52   ` Josh Steadmon
  0 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-07-30 21:52 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On 2019.07.30 09:29, Derrick Stolee wrote:
> On 7/29/2019 6:20 PM, Josh Steadmon wrote:
> > trace2 can write files into a target directory. With heavy usage, this
> > directory can fill up with files, causing difficulty for
> > trace-processing systems.
> > 
> > When trace2 would write a file to a target directory, first check
> > whether or not the directory is overloaded. A directory is overloaded if
> > there is a sentinel file declaring an overload, or if the number of
> > files exceeds a threshold. If the latter, create a sentinel file to
> > speed up later overload checks.
> > 
> > The file count threshold is currently set to 1M files, but this can be
> > overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT.
> 
> 1 million seems like a LOT, and the environment variable seems to be only
> for testing.
> 
> * If the variable is only for testing, then it should start with GIT_TEST_
> 
> * Are we sure 1 million is the right number? I would imagine even 10,000
>   starting to be a problem. How would a user adjust this value if they
>   are having problems before 1,000,000?

Yeah. I think we've only had reports of trouble starting around 5
million files. This definitely feels more like a config variable, but on
the other hand I thought there was some resistance towards adding the
somewhat special early-initialization trace config variables. So for the
first revision I figured I'd just throw out a constant and see if there
were any objections.

If people feel like it's OK to add this to the early trace2 config
options, then I'd be happy to do that in V2.

> > The assumption is that a separate trace-processing system is dealing
> > with the generated traces; once it processes and removes the sentinel
> > file, it should be safe to generate new trace files again.
> 
> This matches the model that you (Google) are using for collecting logs.
> I'll trust your expertise here in how backed up these logs become. I
> imagine that someone working without a network connection for a long
> time would be likely to run into this problem.
>
> [snip]
> 
> > +test_expect_success "don't overload target directory" '
> > +	GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 &&
> 
> For testing, does this need to be 100? Could it be 5?

Sure, changed to 5 in V2.


> > +	export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT &&
> 
> To avoid leakage to other (future) tests, should these be in a subshell?

Yes, thanks for the catch. Fixed in V2.


> > +	test_when_finished "rm -r trace_target_dir" &&
> > +	mkdir trace_target_dir &&
> > +	test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt &&
> > +	xargs touch < expected_filenames.txt &&
> 
> nit: no space between redirection and filename.

Fixed in V2.


> > +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt &&
> > +	test_cmp expected_filenames.txt first_ls_output.txt &&
> > +	GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 &&
> > +	echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt &&
> > +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt &&
> > +	test_cmp expected_filenames.txt second_ls_output.txt
> > +'
> 
> [snip]
> 
> > +/*
> > + * Check to make sure we're not overloading the target directory with too many
> > + * files. First check for the presence of a sentinel file, then check file
> > + * count. If we are overloaded, create the sentinel file if it doesn't already
> > + * exist.
> > + *
> > + * We expect that some trace processing system is gradually collecting files
> > + * from the target directory; after it removes the sentinel file we'll start
> > + * writing traces again.
> > + */
> > +static int tr2_dst_overloaded(const char *tgt_prefix)
> > +{
> > +	int file_count = 0, overload_file_count = 0;
> > +	char *test_threshold_val;
> > +	DIR *dirp;
> > +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> > +	struct stat statbuf;
> > +
> > +	strbuf_addstr(&path, tgt_prefix);
> > +	if (!is_dir_sep(path.buf[path.len - 1])) {
> > +		strbuf_addch(&path, '/');
> > +	}
> > +
> > +	/* check sentinel */
> > +	strbuf_addstr(&sentinel_path, path.buf);
> > +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> > +	if (!stat(sentinel_path.buf, &statbuf)) {
> > +		strbuf_release(&path);
> > +		return 1;
> > +	}
> > +
> > +	/* check if we're overriding the threshold (e.g., for testing) */
> > +	test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT");
> > +	if (test_threshold_val)
> > +		overload_file_count = atoi(test_threshold_val);
> > +	if (overload_file_count <= 0)
> > +		overload_file_count = OVERLOAD_FILE_COUNT;
> > +
> > +
> > +	/* check file count */
> > +	dirp = opendir(path.buf);
> > +	while (file_count < overload_file_count && dirp && readdir(dirp))
> > +		file_count++;
> > +	if (dirp)
> > +		closedir(dirp);
> > +
> > +	if (file_count >= overload_file_count) {
> > +		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
> > +		/* TODO: Write a target-specific message? */
> 
> Perhaps leave the TODO out of the code? I did see it in your commit message.

Fixed in V2.


> > +		strbuf_release(&path);
> > +		return 1;
> > +	}
> > +
> > +	strbuf_release(&path);
> > +	return 0;
> > +}
> > +
> >  static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
> >  {
> >  	int fd;
> > @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
> >  	strbuf_addstr(&path, sid);
> >  	base_path_len = path.len;
> >  
> > +	if (tr2_dst_overloaded(tgt_prefix)) {
> > +		strbuf_release(&path);
> > +		if (tr2_dst_want_warning())
> > +			warning("trace2: not opening %s trace file due to too "
> > +				"many files in target directory %s",
> > +				tr2_sysenv_display_name(dst->sysenv_var),
> > +				tgt_prefix);
> > +		return 0;
> > +	}
> > +
> >  	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
> >  		if (attempt_count > 0) {
> >  			strbuf_setlen(&path, base_path_len);
> > 
> 
> Overall, this looks correct and the test is very clear. Seems to be a helpful feature!
> 
> I only have the nits mentioned above.

Thanks for the review!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] trace2: don't overload target directories
  2019-07-30 16:46 ` Jeff Hostetler
@ 2019-07-30 22:01   ` Josh Steadmon
  2019-07-30 22:02   ` Josh Steadmon
  1 sibling, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-07-30 22:01 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git

On 2019.07.30 12:46, Jeff Hostetler wrote:
> 
> 
> On 7/29/2019 6:20 PM, Josh Steadmon wrote:
> > trace2 can write files into a target directory. With heavy usage, this
> > directory can fill up with files, causing difficulty for
> > trace-processing systems.
> > 
> > When trace2 would write a file to a target directory, first check
> > whether or not the directory is overloaded. A directory is overloaded if
> > there is a sentinel file declaring an overload, or if the number of
> > files exceeds a threshold. If the latter, create a sentinel file to
> > speed up later overload checks.
> 
> Something about this idea bothers me, but I can't quite put my finger
> on it.  You're filling a directory with thousands of files while
> (hopefully simultaneously) having a post-processor/aggregator app
> read and delete them.  I understand that if the aggregator falls
> behind or isn't running, the files will just accumulate and that the
> total number of files is the problem.  But I have to wonder if
> contention on that directory is going to be a bottleneck and/or
> a source of problems.  That is, you'll have one process reading and
> deleting and one or more Git processes scanning/counting/creating.
> It seems like there might be opportunity for some kinds of races
> here.

Yeah, this probably deserves some performance testing. I'll see what I
can arrange prior to sending out V2. I don't think that racing issue is
that big a deal, as there's not a ton of difference if we stop tracing
after (say) 10**6 - 100 files vs 10**6 + 100 or whatever. But contention
from multiple processes could definitely slow things down.


> It have to wonder if it would be better to do some kind of directory
> rotation rather than create a marker file.
> 
> Alternatively, I think it would be better to not have the marker
> file be inside the directory, but rather have a lock file somewhere
> to temporarily disable tracing.  Then your stat() call would not need
> to effectively search the large directory.  Maybe make this
> "<dirname>.lock" as a peer to "<dirname>/", for example.

Yeah, this would make the stat() faster. But we'd have to add logic
either to git or to the collection system to clean up the .lock files
after we empty out the target directory. I'll have to think about this
and see if the collection team is open to this.


> > The file count threshold is currently set to 1M files, but this can be
> > overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT.
> > 
> > The assumption is that a separate trace-processing system is dealing
> > with the generated traces; once it processes and removes the sentinel
> > file, it should be safe to generate new trace files again.
> > 
> > Potential future work:
> > * Write a message into the sentinel file (should match the requested
> >    trace2 output format).
> > * Make the overload threshold (and the whole overload feature)
> >    configurable.
> 
> I'm wondering if we should just make this setting another
> value in `tr2_sysenv_settings[]` rather than a *_TEST_* env var.
> 
> That would give you both env and system/global config support,
> since I'm assuming you'd eventually want to have this be in
> the user's global config with the other trace2 settings.
> 
> All of your tests could be expressed in terms of this new setting
> and we wouldn't need this new test env var.

I do think this makes more sense as a config variable, but as I
mentioned in my reply to Stolee, my impression was that we were somewhat
hesitant to add new trace config variables if they have to go through
the special early config initialization. If that impression is wrong
I'll add the config var.

> > 
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> > ---
> >   t/t0210-trace2-normal.sh | 15 ++++++++
> >   trace2/tr2_dst.c         | 81 ++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 96 insertions(+)
> > 
> > diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> > index ce7574edb1..e8a03e9212 100755
> > --- a/t/t0210-trace2-normal.sh
> > +++ b/t/t0210-trace2-normal.sh
> > @@ -186,4 +186,19 @@ test_expect_success 'using global config with include' '
> >   	test_cmp expect actual
> >   '
> > +test_expect_success "don't overload target directory" '
> > +	GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 &&
> > +	export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT &&
> > +	test_when_finished "rm -r trace_target_dir" &&
> > +	mkdir trace_target_dir &&
> > +	test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt &&
> > +	xargs touch < expected_filenames.txt &&
> > +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt &&
> > +	test_cmp expected_filenames.txt first_ls_output.txt &&
> > +	GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 &&
> > +	echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt &&
> > +	ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt &&
> > +	test_cmp expected_filenames.txt second_ls_output.txt
> > +'
> > +
> >   test_done
> > diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> > index 5dda0ca1cd..3286297918 100644
> > --- a/trace2/tr2_dst.c
> > +++ b/trace2/tr2_dst.c
> > @@ -1,3 +1,5 @@
> > +#include <dirent.h>
> > +
> >   #include "cache.h"
> >   #include "trace2/tr2_dst.h"
> >   #include "trace2/tr2_sid.h"
> > @@ -8,6 +10,18 @@
> >    */
> >   #define MAX_AUTO_ATTEMPTS 10
> > +/*
> > + * Sentinel file used to detect when we're overloading a directory with too many
> > + * trace files.
> > + */
> > +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
> > +
> > +/*
> > + * How many files we can write to a directory before entering overload mode.
> > + * This can be overridden with the envvar GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT
> > + */
> > +#define OVERLOAD_FILE_COUNT 1000000
> > +
> >   static int tr2_dst_want_warning(void)
> >   {
> >   	static int tr2env_dst_debug = -1;
> > @@ -32,6 +46,63 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
> >   	dst->need_close = 0;
> >   }
> > +/*
> > + * Check to make sure we're not overloading the target directory with too many
> > + * files. First check for the presence of a sentinel file, then check file
> > + * count. If we are overloaded, create the sentinel file if it doesn't already
> > + * exist.
> > + *
> > + * We expect that some trace processing system is gradually collecting files
> > + * from the target directory; after it removes the sentinel file we'll start
> > + * writing traces again.
> > + */
> > +static int tr2_dst_overloaded(const char *tgt_prefix)
> > +{
> > +	int file_count = 0, overload_file_count = 0;
> > +	char *test_threshold_val;
> > +	DIR *dirp;
> > +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> > +	struct stat statbuf;
> > +
> > +	strbuf_addstr(&path, tgt_prefix);
> > +	if (!is_dir_sep(path.buf[path.len - 1])) {
> > +		strbuf_addch(&path, '/');
> > +	}
> > +
> > +	/* check sentinel */
> > +	strbuf_addstr(&sentinel_path, path.buf);
> > +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> > +	if (!stat(sentinel_path.buf, &statbuf)) {
> > +		strbuf_release(&path);
> 
> Also release sentinel_path ?
> (And in both of the return statements below.)
> 
> > +		return 1;
> > +	}
> > +
> > +	/* check if we're overriding the threshold (e.g., for testing) */
> > +	test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT");
> > +	if (test_threshold_val)
> > +		overload_file_count = atoi(test_threshold_val);
> > +	if (overload_file_count <= 0)
> > +		overload_file_count = OVERLOAD_FILE_COUNT;
> > +
> > +
> > +	/* check file count */
> > +	dirp = opendir(path.buf);
> > +	while (file_count < overload_file_count && dirp && readdir(dirp))
> > +		file_count++;
> > +	if (dirp)
> > +		closedir(dirp);
> > +
> > +	if (file_count >= overload_file_count) {
> > +		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
> > +		/* TODO: Write a target-specific message? */
> > +		strbuf_release(&path);
> > +		return 1;
> > +	}
> > +
> > +	strbuf_release(&path);
> > +	return 0;
> > +}
> > +
> >   static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
> >   {
> >   	int fd;
> > @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
> >   	strbuf_addstr(&path, sid);
> >   	base_path_len = path.len;
> > +	if (tr2_dst_overloaded(tgt_prefix)) {
> > +		strbuf_release(&path);
> > +		if (tr2_dst_want_warning())
> > +			warning("trace2: not opening %s trace file due to too "
> > +				"many files in target directory %s",
> > +				tr2_sysenv_display_name(dst->sysenv_var),
> > +				tgt_prefix);
> > +		return 0;
> > +	}
> > +
> >   	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
> >   		if (attempt_count > 0) {
> >   			strbuf_setlen(&path, base_path_len);
> > 
> 
> hope this helps,
> Jeff
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] trace2: don't overload target directories
  2019-07-30 16:46 ` Jeff Hostetler
  2019-07-30 22:01   ` Josh Steadmon
@ 2019-07-30 22:02   ` Josh Steadmon
  1 sibling, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-07-30 22:02 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git

On 2019.07.30 12:46, Jeff Hostetler wrote:
[snip]
> Also release sentinel_path ?
> (And in both of the return statements below.)

Thanks for the catch. Fixed in V2.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] trace2: don't overload target directories
  2019-07-30 18:00 ` Jeff Hostetler
@ 2019-07-30 22:08   ` Josh Steadmon
  0 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-07-30 22:08 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git

On 2019.07.30 14:00, Jeff Hostetler wrote:
> 
> 
> On 7/29/2019 6:20 PM, Josh Steadmon wrote:
> > trace2 can write files into a target directory. With heavy usage, this
> > directory can fill up with files, causing difficulty for
> > trace-processing systems.
> 
> I'm routing data in my org to a daemon via a Named Pipe or UD Socket,
> so I'm not seeing the thousands of files problems that you're seeing.
> However, we were being overwhelmed with lots of "uninteresting" commands
> and so I added some whitelisting to my post-processing daemon.  For
> example, I want to know about checkout and push times -- I really don't
> care about rev-parse or config times or other such minor commands.
> 
> I went one step further and allow either "(cmd_name)" or
> the pair "(cmd_name, cmd_mode)".  This lets me select all checkouts
> and limit checkouts to branch-changing ones, for example.  I drop
> any events in my post-processor that does not match any of my whitelist
> patterns.
> 
> Perhaps you could run a quick histogram and see if something would
> be useful to pre-filter the data.  That is, if we had whitelisting
> within git.exe itself, would you still have too much data and/or
> would you still need the overload feature that you've proposed in
> this RFC?

Our problem is not so much with the volume of data as the fact that a
few special users have a ton of git invocations in rapid succession,
each of which creates a new trace file. If we had a more synchronous
collection system like yours it would probably not be so much of a
challenge. But the massive number of files created in a short timeframe
revealed some inefficiencies in our collection system (which are
thankfully being addressed by the owners of that code). But we probably
still want some sort of overload prevention feature as long as we're
using the target directory approach.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH v2 0/2] trace2: don't overload target directories
  2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
                   ` (2 preceding siblings ...)
  2019-07-30 18:00 ` Jeff Hostetler
@ 2019-08-02 22:02 ` Josh Steadmon
  2019-08-02 22:02   ` [RFC PATCH v2 1/2] docs: mention trace2 target-dir mode in git-config Josh Steadmon
  2019-08-02 22:02   ` [RFC PATCH v2 2/2] trace2: don't overload target directories Josh Steadmon
  2019-09-14  0:25 ` [RFC PATCH v3 0/3] " Josh Steadmon
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-08-02 22:02 UTC (permalink / raw)
  To: git; +Cc: stolee, git

I'm sending out V2 still as an RFC because I haven't yet had time to
check that directory contention doesn't create problems with multiple
processes sharing the same target directory. I'll be on vacation for the
next couple of weeks, so I wanted to get the new config variable version
out before then.

I noticed that git-config doesn't currently mention target directories,
so I've included a patch to fix that as well.

Changes since V1:
* Adds a new patch that includes a description of trace2's
  target-directory mode in the git-config documentation.
* Moves the file count threshold from a #define constant to a config
  option.
* Renames the threshold override envvar to be consistent with other
  trace2 envvars.
* Simplified the new test case in t0210.

Josh Steadmon (2):
  docs: mention trace2 target-dir mode in git-config
  trace2: don't overload target directories

 Documentation/config/trace2.txt        |  6 ++
 Documentation/technical/api-trace2.txt |  7 +--
 Documentation/trace2-target-values.txt |  4 +-
 t/t0210-trace2-normal.sh               | 19 ++++++
 trace2/tr2_dst.c                       | 86 ++++++++++++++++++++++++++
 trace2/tr2_sysenv.c                    |  3 +
 trace2/tr2_sysenv.h                    |  2 +
 7 files changed, 122 insertions(+), 5 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  65e05a3db5 docs: mention trace2 target-dir mode in git-config
1:  99e4a0fe40 ! 2:  a779e272df trace2: don't overload target directories
    @@ Commit message
     
     
    + ## Documentation/config/trace2.txt ##
    +@@ Documentation/config/trace2.txt: trace2.destinationDebug::
    + 	By default, these errors are suppressed and tracing is
    + 	silently disabled.  May be overridden by the
    + 	`GIT_TRACE2_DST_DEBUG` environment variable.
    ++
    ++trace2.maxFiles::
    ++	Integer.  When writing trace files to a target directory, do not
    ++	write additional traces if we would exceed this many files. Instead,
    ++	write a sentinel file that will block further tracing to this
    ++	directory. Defaults to 0, which disables this check.
    +
      ## t/t0210-trace2-normal.sh ##
     @@ t/t0210-trace2-normal.sh: test_expect_success 'using global config with include' '
      	test_cmp expect actual
      '
      
     +test_expect_success "don't overload target directory" '
    -+	GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 &&
    -+	export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT &&
    -+	test_when_finished "rm -r trace_target_dir" &&
     +	mkdir trace_target_dir &&
    -+	test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt &&
    -+	xargs touch < expected_filenames.txt &&
    -+	ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt &&
    -+	test_cmp expected_filenames.txt first_ls_output.txt &&
    -+	GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 &&
    -+	echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt &&
    -+	ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt &&
    ++	test_when_finished "rm -r trace_target_dir" &&
    ++	(
    ++		GIT_TRACE2_MAX_FILES=5 &&
    ++		export GIT_TRACE2_MAX_FILES &&
    ++		cd trace_target_dir &&
    ++		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
    ++		xargs touch <../expected_filenames.txt &&
    ++		cd .. &&
    ++		ls trace_target_dir >first_ls_output.txt &&
    ++		test_cmp expected_filenames.txt first_ls_output.txt &&
    ++		GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0
    ++	) &&
    ++	echo git-trace2-overload >>expected_filenames.txt &&
    ++	ls trace_target_dir >second_ls_output.txt &&
     +	test_cmp expected_filenames.txt second_ls_output.txt
     +'
     +
    @@ trace2/tr2_dst.c
     +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
     +
     +/*
    -+ * How many files we can write to a directory before entering overload mode.
    -+ * This can be overridden with the envvar GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT
    ++ * When set to zero, disables directory overload checking. Otherwise, controls
    ++ * how many files we can write to a directory before entering overload mode.
    ++ * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
     + */
    -+#define OVERLOAD_FILE_COUNT 1000000
    ++static int tr2env_max_files = 0;
     +
      static int tr2_dst_want_warning(void)
      {
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
      
     +/*
     + * Check to make sure we're not overloading the target directory with too many
    -+ * files. First check for the presence of a sentinel file, then check file
    -+ * count. If we are overloaded, create the sentinel file if it doesn't already
    -+ * exist.
    ++ * files. First get the threshold (if present) from the config or envvar. If
    ++ * it's zero or unset, disable this check.  Next check for the presence of a
    ++ * sentinel file, then check file count. If we are overloaded, create the
    ++ * sentinel file if it doesn't already exist.
     + *
     + * We expect that some trace processing system is gradually collecting files
     + * from the target directory; after it removes the sentinel file we'll start
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     + */
     +static int tr2_dst_overloaded(const char *tgt_prefix)
     +{
    -+	int file_count = 0, overload_file_count = 0;
    -+	char *test_threshold_val;
    ++	int file_count = 0, max_files = 0, ret = 0;
    ++	const char *max_files_var;
     +	DIR *dirp;
     +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
     +	struct stat statbuf;
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     +		strbuf_addch(&path, '/');
     +	}
     +
    ++	/* Get the config or envvar and decide if we should continue this check */
    ++	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
    ++	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
    ++		tr2env_max_files = max_files;
    ++
    ++	if (!tr2env_max_files) {
    ++		ret = 0;
    ++		goto cleanup;
    ++	}
    ++
     +	/* check sentinel */
     +	strbuf_addstr(&sentinel_path, path.buf);
     +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
     +	if (!stat(sentinel_path.buf, &statbuf)) {
    -+		strbuf_release(&path);
    -+		return 1;
    ++		ret = 1;
    ++		goto cleanup;
     +	}
     +
    -+	/* check if we're overriding the threshold (e.g., for testing) */
    -+	test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT");
    -+	if (test_threshold_val)
    -+		overload_file_count = atoi(test_threshold_val);
    -+	if (overload_file_count <= 0)
    -+		overload_file_count = OVERLOAD_FILE_COUNT;
    -+
    -+
     +	/* check file count */
     +	dirp = opendir(path.buf);
    -+	while (file_count < overload_file_count && dirp && readdir(dirp))
    ++	while (file_count < tr2env_max_files && dirp && readdir(dirp))
     +		file_count++;
     +	if (dirp)
     +		closedir(dirp);
     +
    -+	if (file_count >= overload_file_count) {
    ++	if (file_count >= tr2env_max_files) {
     +		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
    -+		/* TODO: Write a target-specific message? */
    -+		strbuf_release(&path);
    -+		return 1;
    ++		ret = 1;
    ++		goto cleanup;
     +	}
     +
    ++cleanup:
     +	strbuf_release(&path);
    -+	return 0;
    ++	strbuf_release(&sentinel_path);
    ++	return ret;
     +}
     +
      static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
    @@ trace2/tr2_dst.c: static int tr2_dst_try_auto_path(struct tr2_dst *dst, const ch
      	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
      		if (attempt_count > 0) {
      			strbuf_setlen(&path, base_path_len);
    +
    + ## trace2/tr2_sysenv.c ##
    +@@ trace2/tr2_sysenv.c: static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
    + 				       "trace2.perftarget" },
    + 	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
    + 				       "trace2.perfbrief" },
    ++
    ++	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
    ++				       "trace2.maxfiles" },
    + };
    + /* clang-format on */
    + 
    +
    + ## trace2/tr2_sysenv.h ##
    +@@ trace2/tr2_sysenv.h: enum tr2_sysenv_variable {
    + 	TR2_SYSENV_PERF,
    + 	TR2_SYSENV_PERF_BRIEF,
    + 
    ++	TR2_SYSENV_MAX_FILES,
    ++
    + 	TR2_SYSENV_MUST_BE_LAST
    + };
    + 
--
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH v2 1/2] docs: mention trace2 target-dir mode in git-config
  2019-08-02 22:02 ` [RFC PATCH v2 0/2] " Josh Steadmon
@ 2019-08-02 22:02   ` Josh Steadmon
  2019-08-02 22:02   ` [RFC PATCH v2 2/2] trace2: don't overload target directories Josh Steadmon
  1 sibling, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-08-02 22:02 UTC (permalink / raw)
  To: git; +Cc: stolee, git

Move the description of trace2's target-directory behavior into the
shared trace2-target-values file so that it is included in both the
git-config and api-trace2 docs. Leave the SID discussion only in
api-trace2 since it's a technical detail.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 7 +++----
 Documentation/trace2-target-values.txt | 4 +++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index f7ffe7d599..b831d65460 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -142,10 +142,9 @@ system or global config value to one of the following:
 
 include::../trace2-target-values.txt[]
 
-If the target already exists and is a directory, the traces will be
-written to files (one per process) underneath the given directory. They
-will be named according to the last component of the SID (optionally
-followed by a counter to avoid filename collisions).
+When trace files are written to a target directory, they will be named according
+to the last component of the SID (optionally followed by a counter to avoid
+filename collisions).
 
 == Trace2 API
 
diff --git a/Documentation/trace2-target-values.txt b/Documentation/trace2-target-values.txt
index 27d3c64e66..3985b6d3c2 100644
--- a/Documentation/trace2-target-values.txt
+++ b/Documentation/trace2-target-values.txt
@@ -2,7 +2,9 @@
 * `0` or `false` - Disables the target.
 * `1` or `true` - Writes to `STDERR`.
 * `[2-9]` - Writes to the already opened file descriptor.
-* `<absolute-pathname>` - Writes to the file in append mode.
+* `<absolute-pathname>` - Writes to the file in append mode. If the target
+already exists and is a directory, the traces will be written to files (one
+per process) underneath the given directory.
 * `af_unix:[<socket_type>:]<absolute-pathname>` - Write to a
 Unix DomainSocket (on platforms that support them).  Socket
 type can be either `stream` or `dgram`; if omitted Git will
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH v2 2/2] trace2: don't overload target directories
  2019-08-02 22:02 ` [RFC PATCH v2 0/2] " Josh Steadmon
  2019-08-02 22:02   ` [RFC PATCH v2 1/2] docs: mention trace2 target-dir mode in git-config Josh Steadmon
@ 2019-08-02 22:02   ` Josh Steadmon
  2019-08-05 15:34     ` Jeff Hostetler
  2019-08-05 18:01     ` SZEDER Gábor
  1 sibling, 2 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-08-02 22:02 UTC (permalink / raw)
  To: git; +Cc: stolee, git

trace2 can write files into a target directory. With heavy usage, this
directory can fill up with files, causing difficulty for
trace-processing systems.

This patch adds a config option (trace2.maxFiles) to set a maximum
number of files that trace2 will write to a target directory. The
following behavior is enabled when the maxFiles is set to a positive
integer:
  When trace2 would write a file to a target directory, first check
  whether or not the directory is overloaded. A directory is overloaded
  if there is a sentinel file declaring an overload, or if the number of
  files exceeds trace2.maxFiles. If the latter, create a sentinel file
  to speed up later overload checks.

The assumption is that a separate trace-processing system is dealing
with the generated traces; once it processes and removes the sentinel
file, it should be safe to generate new trace files again.

The default value for trace2.maxFiles is zero, which disables the
overload check.

The config can also be overridden with a new environment variable:
GIT_TRACE2_MAX_FILES.

Potential future work:
* Write a message into the sentinel file (should match the requested
  trace2 output format).
* Add a performance test to make sure that contention between multiple
  processes all writing to the same target directory does not become an
  issue.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/config/trace2.txt |  6 +++
 t/t0210-trace2-normal.sh        | 19 ++++++++
 trace2/tr2_dst.c                | 86 +++++++++++++++++++++++++++++++++
 trace2/tr2_sysenv.c             |  3 ++
 trace2/tr2_sysenv.h             |  2 +
 5 files changed, 116 insertions(+)

diff --git a/Documentation/config/trace2.txt b/Documentation/config/trace2.txt
index 2edbfb02fe..4ce0b9a6d1 100644
--- a/Documentation/config/trace2.txt
+++ b/Documentation/config/trace2.txt
@@ -54,3 +54,9 @@ trace2.destinationDebug::
 	By default, these errors are suppressed and tracing is
 	silently disabled.  May be overridden by the
 	`GIT_TRACE2_DST_DEBUG` environment variable.
+
+trace2.maxFiles::
+	Integer.  When writing trace files to a target directory, do not
+	write additional traces if we would exceed this many files. Instead,
+	write a sentinel file that will block further tracing to this
+	directory. Defaults to 0, which disables this check.
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index ce7574edb1..59b9560109 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -186,4 +186,23 @@ test_expect_success 'using global config with include' '
 	test_cmp expect actual
 '
 
+test_expect_success "don't overload target directory" '
+	mkdir trace_target_dir &&
+	test_when_finished "rm -r trace_target_dir" &&
+	(
+		GIT_TRACE2_MAX_FILES=5 &&
+		export GIT_TRACE2_MAX_FILES &&
+		cd trace_target_dir &&
+		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
+		xargs touch <../expected_filenames.txt &&
+		cd .. &&
+		ls trace_target_dir >first_ls_output.txt &&
+		test_cmp expected_filenames.txt first_ls_output.txt &&
+		GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0
+	) &&
+	echo git-trace2-overload >>expected_filenames.txt &&
+	ls trace_target_dir >second_ls_output.txt &&
+	test_cmp expected_filenames.txt second_ls_output.txt
+'
+
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 5dda0ca1cd..40ec03c2d7 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,3 +1,5 @@
+#include <dirent.h>
+
 #include "cache.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
@@ -8,6 +10,19 @@
  */
 #define MAX_AUTO_ATTEMPTS 10
 
+/*
+ * Sentinel file used to detect when we're overloading a directory with too many
+ * trace files.
+ */
+#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
+
+/*
+ * When set to zero, disables directory overload checking. Otherwise, controls
+ * how many files we can write to a directory before entering overload mode.
+ * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
+ */
+static int tr2env_max_files = 0;
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+/*
+ * Check to make sure we're not overloading the target directory with too many
+ * files. First get the threshold (if present) from the config or envvar. If
+ * it's zero or unset, disable this check.  Next check for the presence of a
+ * sentinel file, then check file count. If we are overloaded, create the
+ * sentinel file if it doesn't already exist.
+ *
+ * We expect that some trace processing system is gradually collecting files
+ * from the target directory; after it removes the sentinel file we'll start
+ * writing traces again.
+ */
+static int tr2_dst_overloaded(const char *tgt_prefix)
+{
+	int file_count = 0, max_files = 0, ret = 0;
+	const char *max_files_var;
+	DIR *dirp;
+	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
+	struct stat statbuf;
+
+	strbuf_addstr(&path, tgt_prefix);
+	if (!is_dir_sep(path.buf[path.len - 1])) {
+		strbuf_addch(&path, '/');
+	}
+
+	/* Get the config or envvar and decide if we should continue this check */
+	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
+	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
+		tr2env_max_files = max_files;
+
+	if (!tr2env_max_files) {
+		ret = 0;
+		goto cleanup;
+	}
+
+	/* check sentinel */
+	strbuf_addstr(&sentinel_path, path.buf);
+	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
+	if (!stat(sentinel_path.buf, &statbuf)) {
+		ret = 1;
+		goto cleanup;
+	}
+
+	/* check file count */
+	dirp = opendir(path.buf);
+	while (file_count < tr2env_max_files && dirp && readdir(dirp))
+		file_count++;
+	if (dirp)
+		closedir(dirp);
+
+	if (file_count >= tr2env_max_files) {
+		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
+		ret = 1;
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&path);
+	strbuf_release(&sentinel_path);
+	return ret;
+}
+
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int fd;
@@ -50,6 +126,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
+	if (tr2_dst_overloaded(tgt_prefix)) {
+		strbuf_release(&path);
+		if (tr2_dst_want_warning())
+			warning("trace2: not opening %s trace file due to too "
+				"many files in target directory %s",
+				tr2_sysenv_display_name(dst->sysenv_var),
+				tgt_prefix);
+		return 0;
+	}
+
 	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
 		if (attempt_count > 0) {
 			strbuf_setlen(&path, base_path_len);
diff --git a/trace2/tr2_sysenv.c b/trace2/tr2_sysenv.c
index 5958cfc424..3c3792eca2 100644
--- a/trace2/tr2_sysenv.c
+++ b/trace2/tr2_sysenv.c
@@ -49,6 +49,9 @@ static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
 				       "trace2.perftarget" },
 	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
 				       "trace2.perfbrief" },
+
+	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
+				       "trace2.maxfiles" },
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_sysenv.h b/trace2/tr2_sysenv.h
index 8dd82a7a56..d4364a7b85 100644
--- a/trace2/tr2_sysenv.h
+++ b/trace2/tr2_sysenv.h
@@ -24,6 +24,8 @@ enum tr2_sysenv_variable {
 	TR2_SYSENV_PERF,
 	TR2_SYSENV_PERF_BRIEF,
 
+	TR2_SYSENV_MAX_FILES,
+
 	TR2_SYSENV_MUST_BE_LAST
 };
 
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v2 2/2] trace2: don't overload target directories
  2019-08-02 22:02   ` [RFC PATCH v2 2/2] trace2: don't overload target directories Josh Steadmon
@ 2019-08-05 15:34     ` Jeff Hostetler
  2019-08-05 18:17       ` Josh Steadmon
  2019-08-05 18:01     ` SZEDER Gábor
  1 sibling, 1 reply; 40+ messages in thread
From: Jeff Hostetler @ 2019-08-05 15:34 UTC (permalink / raw)
  To: Josh Steadmon, git; +Cc: stolee



On 8/2/2019 6:02 PM, Josh Steadmon wrote:
> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.
> 
> This patch adds a config option (trace2.maxFiles) to set a maximum
> number of files that trace2 will write to a target directory. The
> following behavior is enabled when the maxFiles is set to a positive
> integer:
>    When trace2 would write a file to a target directory, first check
>    whether or not the directory is overloaded. A directory is overloaded
>    if there is a sentinel file declaring an overload, or if the number of
>    files exceeds trace2.maxFiles. If the latter, create a sentinel file
>    to speed up later overload checks.
> 
> The assumption is that a separate trace-processing system is dealing
> with the generated traces; once it processes and removes the sentinel
> file, it should be safe to generate new trace files again.
> 
> The default value for trace2.maxFiles is zero, which disables the
> overload check.
> 
> The config can also be overridden with a new environment variable:
> GIT_TRACE2_MAX_FILES.
> 
> Potential future work:
> * Write a message into the sentinel file (should match the requested
>    trace2 output format).
> * Add a performance test to make sure that contention between multiple
>    processes all writing to the same target directory does not become an
>    issue.


This looks much nicer than the V1 version.  Having it be a
real feature rather than a test feature helps.

I don't see anything wrong with this.  I do worry about the
overhead a bit.  If you really have that many files in the
target directory, having every command count them at startup
might be an issue.

As an alternative, you might consider doing something like
this:

[] have an option to make the target directory path expand to
    something like "<path>/yyyymmdd/" and create the per-process
    files as "<path>/yyyymmdd/<sid>".

If there are 0, 1 or 2 directories, logging is enabled.
We assume that the post-processor is keeping up and all is well.
We need to allow 2 so that we continue to log around midnight.

If there are 3 or more directories, logging is disabled.
The post-processor is more than 24 hours behind for whatever
reason.  We assume here that the post-processor will process
and delete the oldest-named directory, so it is a valid measure
of the backlog.

I suggest "yyyymmdd" here for simplicity in this discussion
as daily log rotation is common.  If that's still overloading,
you could make it a longer prefix of the <sid>.  And include
the hour, for example.

I suggest 3 as the cutoff lower bound, because we need to allow
2 for midnight rotation.  But you may want to increase it to
allow for someone to be offline for a long weekend, for example.

Anyway, this is just a suggestion.  It would give you the
throttling, but without the need for every command to count
the contents of the target directory.

And it would still allow your post-processor to operate in
near real-time on the contents of the current day's target
directory or to hang back if that causes too much contention.

Feel free to ignore this :-)

Jeff


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v2 2/2] trace2: don't overload target directories
  2019-08-02 22:02   ` [RFC PATCH v2 2/2] trace2: don't overload target directories Josh Steadmon
  2019-08-05 15:34     ` Jeff Hostetler
@ 2019-08-05 18:01     ` SZEDER Gábor
  2019-08-05 18:09       ` Josh Steadmon
  1 sibling, 1 reply; 40+ messages in thread
From: SZEDER Gábor @ 2019-08-05 18:01 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, stolee, git

On Fri, Aug 02, 2019 at 03:02:35PM -0700, Josh Steadmon wrote:
> +test_expect_success "don't overload target directory" '
> +	mkdir trace_target_dir &&
> +	test_when_finished "rm -r trace_target_dir" &&
> +	(
> +		GIT_TRACE2_MAX_FILES=5 &&
> +		export GIT_TRACE2_MAX_FILES &&
> +		cd trace_target_dir &&
> +		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
> +		xargs touch <../expected_filenames.txt &&
> +		cd .. &&
> +		ls trace_target_dir >first_ls_output.txt &&
> +		test_cmp expected_filenames.txt first_ls_output.txt &&

Nit: what's the purpose of this 'ls' and 'test_cmp'?

It looks like they check that xargs created all the files it was told
to create.  I think that this falls into the category "We are not in
the business of verifying that the world given to us sanely works."
and is unnecessary.

> +		GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0
> +	) &&
> +	echo git-trace2-overload >>expected_filenames.txt &&
> +	ls trace_target_dir >second_ls_output.txt &&
> +	test_cmp expected_filenames.txt second_ls_output.txt
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v2 2/2] trace2: don't overload target directories
  2019-08-05 18:01     ` SZEDER Gábor
@ 2019-08-05 18:09       ` Josh Steadmon
  0 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-08-05 18:09 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, stolee, git

On 2019.08.05 20:01, SZEDER Gábor wrote:
> On Fri, Aug 02, 2019 at 03:02:35PM -0700, Josh Steadmon wrote:
> > +test_expect_success "don't overload target directory" '
> > +	mkdir trace_target_dir &&
> > +	test_when_finished "rm -r trace_target_dir" &&
> > +	(
> > +		GIT_TRACE2_MAX_FILES=5 &&
> > +		export GIT_TRACE2_MAX_FILES &&
> > +		cd trace_target_dir &&
> > +		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
> > +		xargs touch <../expected_filenames.txt &&
> > +		cd .. &&
> > +		ls trace_target_dir >first_ls_output.txt &&
> > +		test_cmp expected_filenames.txt first_ls_output.txt &&
> 
> Nit: what's the purpose of this 'ls' and 'test_cmp'?
> 
> It looks like they check that xargs created all the files it was told
> to create.  I think that this falls into the category "We are not in
> the business of verifying that the world given to us sanely works."
> and is unnecessary.

Understood. Will remove in V3.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v2 2/2] trace2: don't overload target directories
  2019-08-05 15:34     ` Jeff Hostetler
@ 2019-08-05 18:17       ` Josh Steadmon
  0 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-08-05 18:17 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, stolee

On 2019.08.05 11:34, Jeff Hostetler wrote:
> 
> 
> On 8/2/2019 6:02 PM, Josh Steadmon wrote:
> > trace2 can write files into a target directory. With heavy usage, this
> > directory can fill up with files, causing difficulty for
> > trace-processing systems.
> > 
> > This patch adds a config option (trace2.maxFiles) to set a maximum
> > number of files that trace2 will write to a target directory. The
> > following behavior is enabled when the maxFiles is set to a positive
> > integer:
> >    When trace2 would write a file to a target directory, first check
> >    whether or not the directory is overloaded. A directory is overloaded
> >    if there is a sentinel file declaring an overload, or if the number of
> >    files exceeds trace2.maxFiles. If the latter, create a sentinel file
> >    to speed up later overload checks.
> > 
> > The assumption is that a separate trace-processing system is dealing
> > with the generated traces; once it processes and removes the sentinel
> > file, it should be safe to generate new trace files again.
> > 
> > The default value for trace2.maxFiles is zero, which disables the
> > overload check.
> > 
> > The config can also be overridden with a new environment variable:
> > GIT_TRACE2_MAX_FILES.
> > 
> > Potential future work:
> > * Write a message into the sentinel file (should match the requested
> >    trace2 output format).
> > * Add a performance test to make sure that contention between multiple
> >    processes all writing to the same target directory does not become an
> >    issue.
> 
> 
> This looks much nicer than the V1 version.  Having it be a
> real feature rather than a test feature helps.
> 
> I don't see anything wrong with this.  I do worry about the
> overhead a bit.  If you really have that many files in the
> target directory, having every command count them at startup
> might be an issue.
> 
> As an alternative, you might consider doing something like
> this:
> 
> [] have an option to make the target directory path expand to
>    something like "<path>/yyyymmdd/" and create the per-process
>    files as "<path>/yyyymmdd/<sid>".
> 
> If there are 0, 1 or 2 directories, logging is enabled.
> We assume that the post-processor is keeping up and all is well.
> We need to allow 2 so that we continue to log around midnight.
> 
> If there are 3 or more directories, logging is disabled.
> The post-processor is more than 24 hours behind for whatever
> reason.  We assume here that the post-processor will process
> and delete the oldest-named directory, so it is a valid measure
> of the backlog.
> 
> I suggest "yyyymmdd" here for simplicity in this discussion
> as daily log rotation is common.  If that's still overloading,
> you could make it a longer prefix of the <sid>.  And include
> the hour, for example.
> 
> I suggest 3 as the cutoff lower bound, because we need to allow
> 2 for midnight rotation.  But you may want to increase it to
> allow for someone to be offline for a long weekend, for example.
> 
> Anyway, this is just a suggestion.  It would give you the
> throttling, but without the need for every command to count
> the contents of the target directory.
> 
> And it would still allow your post-processor to operate in
> near real-time on the contents of the current day's target
> directory or to hang back if that causes too much contention.
> 
> Feel free to ignore this :-)
> 
> Jeff

This does sound reasonable. I'll talk with our collection team and see
if this would work without requiring any changes on their side. If not,
I'll probably take a stab at this when I'm back from vacation.

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH v3 0/3] trace2: don't overload target directories
  2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
                   ` (3 preceding siblings ...)
  2019-08-02 22:02 ` [RFC PATCH v2 0/2] " Josh Steadmon
@ 2019-09-14  0:25 ` Josh Steadmon
  2019-09-14  0:25   ` [RFC PATCH v3 1/3] docs: mention trace2 target-dir mode in git-config Josh Steadmon
                     ` (2 more replies)
  2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
  2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
  6 siblings, 3 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-09-14  0:25 UTC (permalink / raw)
  To: git; +Cc: stolee, git, szeder.dev

This is still RFC, as I still haven't done performance testing yet. I'm
mainly looking for feedback right now on patch 3/3, which extends the
tr2_dst API with an optional function to write a custom message into the
overload sentinel file.

Changes since V2:
* Added a new patch (3/3) that allows the different trace2 targets to
  write custom messages to the overload sentinel file.
* Added a new "overload" trace2 event type.
* Bumped up the trace2 event format version.
* Moved the test from t0210 to t0212, so that we can test the custom
  writer for the event target at the same time.
* Removed some unnecessary sanity-checking in the test.
* Fixed a coccicheck complaint about strbuf_addbuf.
* Used hardcoded file modes to be consistent with the rest of the
  project.

Josh Steadmon (3):
  docs: mention trace2 target-dir mode in git-config
  trace2: don't overload target directories
  trace2: write overload message to sentinel files

 Documentation/config/trace2.txt        |   6 ++
 Documentation/technical/api-trace2.txt |  24 +++--
 Documentation/trace2-target-values.txt |   4 +-
 t/t0212-trace2-event.sh                |  17 ++++
 trace2/tr2_dst.c                       | 118 +++++++++++++++++++++++++
 trace2/tr2_dst.h                       |   3 +
 trace2/tr2_sysenv.c                    |   3 +
 trace2/tr2_sysenv.h                    |   2 +
 trace2/tr2_tgt_event.c                 |  21 ++++-
 trace2/tr2_tgt_normal.c                |   2 +-
 trace2/tr2_tgt_perf.c                  |   2 +-
 11 files changed, 191 insertions(+), 11 deletions(-)

Range-diff against v2:
1:  65e05a3db5 = 1:  eacffe250d docs: mention trace2 target-dir mode in git-config
2:  f897a11068 ! 2:  bf20ec8ea2 trace2: don't overload target directories
    @@ Documentation/config/trace2.txt: trace2.destinationDebug::
     +	write a sentinel file that will block further tracing to this
     +	directory. Defaults to 0, which disables this check.
     
    - ## t/t0210-trace2-normal.sh ##
    -@@ t/t0210-trace2-normal.sh: test_expect_success 'using global config with include' '
    + ## t/t0212-trace2-event.sh ##
    +@@ t/t0212-trace2-event.sh: test_expect_success JSON_PP 'using global config, event stream, error event' '
      	test_cmp expect actual
      '
      
    @@ t/t0210-trace2-normal.sh: test_expect_success 'using global config with include'
     +		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
     +		xargs touch <../expected_filenames.txt &&
     +		cd .. &&
    -+		ls trace_target_dir >first_ls_output.txt &&
    -+		test_cmp expected_filenames.txt first_ls_output.txt &&
    -+		GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0
    ++		GIT_TRACE2_EVENT="$(pwd)/trace_target_dir" test-tool trace2 001return 0
     +	) &&
     +	echo git-trace2-overload >>expected_filenames.txt &&
    -+	ls trace_target_dir >second_ls_output.txt &&
    -+	test_cmp expected_filenames.txt second_ls_output.txt
    ++	ls trace_target_dir >ls_output.txt &&
    ++	test_cmp expected_filenames.txt ls_output.txt
     +'
     +
      test_done
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     +	}
     +
     +	/* check sentinel */
    -+	strbuf_addstr(&sentinel_path, path.buf);
    ++	strbuf_addbuf(&sentinel_path, &path);
     +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
     +	if (!stat(sentinel_path.buf, &statbuf)) {
     +		ret = 1;
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     +		closedir(dirp);
     +
     +	if (file_count >= tr2env_max_files) {
    -+		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
    ++		creat(sentinel_path.buf, 0666);
     +		ret = 1;
     +		goto cleanup;
     +	}
-:  ---------- > 3:  bab45cb735 trace2: write overload message to sentinel files
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH v3 1/3] docs: mention trace2 target-dir mode in git-config
  2019-09-14  0:25 ` [RFC PATCH v3 0/3] " Josh Steadmon
@ 2019-09-14  0:25   ` Josh Steadmon
  2019-09-14  0:25   ` [RFC PATCH v3 2/3] trace2: don't overload target directories Josh Steadmon
  2019-09-14  0:26   ` [RFC PATCH v3 3/3] trace2: write overload message to sentinel files Josh Steadmon
  2 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-09-14  0:25 UTC (permalink / raw)
  To: git; +Cc: stolee, git, szeder.dev

Move the description of trace2's target-directory behavior into the
shared trace2-target-values file so that it is included in both the
git-config and api-trace2 docs. Leave the SID discussion only in
api-trace2 since it's a technical detail.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 7 +++----
 Documentation/trace2-target-values.txt | 4 +++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 71eb081fed..80ffceada0 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -142,10 +142,9 @@ system or global config value to one of the following:
 
 include::../trace2-target-values.txt[]
 
-If the target already exists and is a directory, the traces will be
-written to files (one per process) underneath the given directory. They
-will be named according to the last component of the SID (optionally
-followed by a counter to avoid filename collisions).
+When trace files are written to a target directory, they will be named according
+to the last component of the SID (optionally followed by a counter to avoid
+filename collisions).
 
 == Trace2 API
 
diff --git a/Documentation/trace2-target-values.txt b/Documentation/trace2-target-values.txt
index 27d3c64e66..3985b6d3c2 100644
--- a/Documentation/trace2-target-values.txt
+++ b/Documentation/trace2-target-values.txt
@@ -2,7 +2,9 @@
 * `0` or `false` - Disables the target.
 * `1` or `true` - Writes to `STDERR`.
 * `[2-9]` - Writes to the already opened file descriptor.
-* `<absolute-pathname>` - Writes to the file in append mode.
+* `<absolute-pathname>` - Writes to the file in append mode. If the target
+already exists and is a directory, the traces will be written to files (one
+per process) underneath the given directory.
 * `af_unix:[<socket_type>:]<absolute-pathname>` - Write to a
 Unix DomainSocket (on platforms that support them).  Socket
 type can be either `stream` or `dgram`; if omitted Git will
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH v3 2/3] trace2: don't overload target directories
  2019-09-14  0:25 ` [RFC PATCH v3 0/3] " Josh Steadmon
  2019-09-14  0:25   ` [RFC PATCH v3 1/3] docs: mention trace2 target-dir mode in git-config Josh Steadmon
@ 2019-09-14  0:25   ` Josh Steadmon
  2019-09-14  0:26   ` [RFC PATCH v3 3/3] trace2: write overload message to sentinel files Josh Steadmon
  2 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-09-14  0:25 UTC (permalink / raw)
  To: git; +Cc: stolee, git, szeder.dev

trace2 can write files into a target directory. With heavy usage, this
directory can fill up with files, causing difficulty for
trace-processing systems.

This patch adds a config option (trace2.maxFiles) to set a maximum
number of files that trace2 will write to a target directory. The
following behavior is enabled when the maxFiles is set to a positive
integer:
  When trace2 would write a file to a target directory, first check
  whether or not the directory is overloaded. A directory is overloaded
  if there is a sentinel file declaring an overload, or if the number of
  files exceeds trace2.maxFiles. If the latter, create a sentinel file
  to speed up later overload checks.

The assumption is that a separate trace-processing system is dealing
with the generated traces; once it processes and removes the sentinel
file, it should be safe to generate new trace files again.

The default value for trace2.maxFiles is zero, which disables the
overload check.

The config can also be overridden with a new environment variable:
GIT_TRACE2_MAX_FILES.

Potential future work:
* Write a message into the sentinel file (should match the requested
  trace2 output format).
* Add a performance test to make sure that contention between multiple
  processes all writing to the same target directory does not become an
  issue.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/config/trace2.txt |  6 +++
 t/t0212-trace2-event.sh         | 17 +++++++
 trace2/tr2_dst.c                | 86 +++++++++++++++++++++++++++++++++
 trace2/tr2_sysenv.c             |  3 ++
 trace2/tr2_sysenv.h             |  2 +
 5 files changed, 114 insertions(+)

diff --git a/Documentation/config/trace2.txt b/Documentation/config/trace2.txt
index 2edbfb02fe..4ce0b9a6d1 100644
--- a/Documentation/config/trace2.txt
+++ b/Documentation/config/trace2.txt
@@ -54,3 +54,9 @@ trace2.destinationDebug::
 	By default, these errors are suppressed and tracing is
 	silently disabled.  May be overridden by the
 	`GIT_TRACE2_DST_DEBUG` environment variable.
+
+trace2.maxFiles::
+	Integer.  When writing trace files to a target directory, do not
+	write additional traces if we would exceed this many files. Instead,
+	write a sentinel file that will block further tracing to this
+	directory. Defaults to 0, which disables this check.
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index ff5b9cc729..2ff97e72da 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -265,4 +265,21 @@ test_expect_success JSON_PP 'using global config, event stream, error event' '
 	test_cmp expect actual
 '
 
+test_expect_success "don't overload target directory" '
+	mkdir trace_target_dir &&
+	test_when_finished "rm -r trace_target_dir" &&
+	(
+		GIT_TRACE2_MAX_FILES=5 &&
+		export GIT_TRACE2_MAX_FILES &&
+		cd trace_target_dir &&
+		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
+		xargs touch <../expected_filenames.txt &&
+		cd .. &&
+		GIT_TRACE2_EVENT="$(pwd)/trace_target_dir" test-tool trace2 001return 0
+	) &&
+	echo git-trace2-overload >>expected_filenames.txt &&
+	ls trace_target_dir >ls_output.txt &&
+	test_cmp expected_filenames.txt ls_output.txt
+'
+
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 5dda0ca1cd..414053d550 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,3 +1,5 @@
+#include <dirent.h>
+
 #include "cache.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
@@ -8,6 +10,19 @@
  */
 #define MAX_AUTO_ATTEMPTS 10
 
+/*
+ * Sentinel file used to detect when we're overloading a directory with too many
+ * trace files.
+ */
+#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
+
+/*
+ * When set to zero, disables directory overload checking. Otherwise, controls
+ * how many files we can write to a directory before entering overload mode.
+ * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
+ */
+static int tr2env_max_files = 0;
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+/*
+ * Check to make sure we're not overloading the target directory with too many
+ * files. First get the threshold (if present) from the config or envvar. If
+ * it's zero or unset, disable this check.  Next check for the presence of a
+ * sentinel file, then check file count. If we are overloaded, create the
+ * sentinel file if it doesn't already exist.
+ *
+ * We expect that some trace processing system is gradually collecting files
+ * from the target directory; after it removes the sentinel file we'll start
+ * writing traces again.
+ */
+static int tr2_dst_overloaded(const char *tgt_prefix)
+{
+	int file_count = 0, max_files = 0, ret = 0;
+	const char *max_files_var;
+	DIR *dirp;
+	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
+	struct stat statbuf;
+
+	strbuf_addstr(&path, tgt_prefix);
+	if (!is_dir_sep(path.buf[path.len - 1])) {
+		strbuf_addch(&path, '/');
+	}
+
+	/* Get the config or envvar and decide if we should continue this check */
+	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
+	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
+		tr2env_max_files = max_files;
+
+	if (!tr2env_max_files) {
+		ret = 0;
+		goto cleanup;
+	}
+
+	/* check sentinel */
+	strbuf_addbuf(&sentinel_path, &path);
+	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
+	if (!stat(sentinel_path.buf, &statbuf)) {
+		ret = 1;
+		goto cleanup;
+	}
+
+	/* check file count */
+	dirp = opendir(path.buf);
+	while (file_count < tr2env_max_files && dirp && readdir(dirp))
+		file_count++;
+	if (dirp)
+		closedir(dirp);
+
+	if (file_count >= tr2env_max_files) {
+		creat(sentinel_path.buf, 0666);
+		ret = 1;
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&path);
+	strbuf_release(&sentinel_path);
+	return ret;
+}
+
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int fd;
@@ -50,6 +126,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
+	if (tr2_dst_overloaded(tgt_prefix)) {
+		strbuf_release(&path);
+		if (tr2_dst_want_warning())
+			warning("trace2: not opening %s trace file due to too "
+				"many files in target directory %s",
+				tr2_sysenv_display_name(dst->sysenv_var),
+				tgt_prefix);
+		return 0;
+	}
+
 	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
 		if (attempt_count > 0) {
 			strbuf_setlen(&path, base_path_len);
diff --git a/trace2/tr2_sysenv.c b/trace2/tr2_sysenv.c
index 5958cfc424..3c3792eca2 100644
--- a/trace2/tr2_sysenv.c
+++ b/trace2/tr2_sysenv.c
@@ -49,6 +49,9 @@ static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
 				       "trace2.perftarget" },
 	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
 				       "trace2.perfbrief" },
+
+	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
+				       "trace2.maxfiles" },
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_sysenv.h b/trace2/tr2_sysenv.h
index 8dd82a7a56..d4364a7b85 100644
--- a/trace2/tr2_sysenv.h
+++ b/trace2/tr2_sysenv.h
@@ -24,6 +24,8 @@ enum tr2_sysenv_variable {
 	TR2_SYSENV_PERF,
 	TR2_SYSENV_PERF_BRIEF,
 
+	TR2_SYSENV_MAX_FILES,
+
 	TR2_SYSENV_MUST_BE_LAST
 };
 
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-14  0:25 ` [RFC PATCH v3 0/3] " Josh Steadmon
  2019-09-14  0:25   ` [RFC PATCH v3 1/3] docs: mention trace2 target-dir mode in git-config Josh Steadmon
  2019-09-14  0:25   ` [RFC PATCH v3 2/3] trace2: don't overload target directories Josh Steadmon
@ 2019-09-14  0:26   ` Josh Steadmon
  2019-09-16 12:07     ` Derrick Stolee
  2 siblings, 1 reply; 40+ messages in thread
From: Josh Steadmon @ 2019-09-14  0:26 UTC (permalink / raw)
  To: git; +Cc: stolee, git, szeder.dev

Add a new "overload" event type for trace2 event destinations. Write
this event into the sentinel file created by the trace2.maxFiles
feature. Bump up the event format version since we've added a new event
type.

Writing this message into the sentinel file is useful for tracking how
often the overload protection feature is triggered in practice.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 17 ++++++++++--
 trace2/tr2_dst.c                       | 38 ++++++++++++++++++++++++--
 trace2/tr2_dst.h                       |  3 ++
 trace2/tr2_tgt_event.c                 | 21 ++++++++++++--
 trace2/tr2_tgt_normal.c                |  2 +-
 trace2/tr2_tgt_perf.c                  |  2 +-
 6 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 80ffceada0..ef26e47805 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -128,7 +128,7 @@ yields
 
 ------------
 $ cat ~/log.event
-{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"}
+{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"2","exe":"2.20.1.155.g426c96fcdb"}
 {"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]}
 {"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"}
 {"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0}
@@ -610,11 +610,24 @@ only present on the "start" and "atexit" events.
 {
 	"event":"version",
 	...
-	"evt":"1",		       # EVENT format version
+	"evt":"2",		       # EVENT format version
 	"exe":"2.20.1.155.g426c96fcdb" # git version
 }
 ------------
 
+`"overload"`::
+	This event is created in a sentinel file if we are overloading a target
+	trace directory (see the trace2.maxFiles config option).
++
+------------
+{
+	"event":"overload",
+	...
+	"dir":"/trace/target/dir/", # The configured trace2 target directory
+	"evt":"2",		    # EVENT format version
+}
+------------
+
 `"start"`::
 	This event contains the complete argv received by main().
 +
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 414053d550..b72be57635 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -47,6 +47,38 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+/*
+ * Create a sentinel file to note that we don't want to create new trace files
+ * in this location. The form of the sentinel file may vary based on the
+ * destination type; the default is to create an empty file, but destination
+ * types can override this by providing an overload_writer function that accepts
+ * the filename, line number, and target path.
+ */
+static void tr2_create_sentinel(struct tr2_dst *dst, const char *dir,
+				const char *sentinel_path)
+{
+	int fd;
+
+	if (dst->overload_writer) {
+		fd = open(sentinel_path, O_WRONLY | O_CREAT | O_EXCL, 0666);
+		if (fd != -1) {
+			dst->fd = fd;
+			/*
+			 * I don't think it's particularly useful to include the
+			 * file and line here, but we expect all trace messages
+			 * (at least for "event" destinations) to include them.
+			 * So I'm adding these for consistency's sake.
+			 */
+			dst->overload_writer(__FILE__, __LINE__, dir);
+			tr2_dst_trace_disable(dst);
+		}
+	} else
+		fd = creat(sentinel_path, 0666);
+
+	if (fd != -1)
+		close(fd);
+}
+
 /*
  * Check to make sure we're not overloading the target directory with too many
  * files. First get the threshold (if present) from the config or envvar. If
@@ -58,7 +90,7 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
  * from the target directory; after it removes the sentinel file we'll start
  * writing traces again.
  */
-static int tr2_dst_overloaded(const char *tgt_prefix)
+static int tr2_dst_overloaded(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int file_count = 0, max_files = 0, ret = 0;
 	const char *max_files_var;
@@ -97,7 +129,7 @@ static int tr2_dst_overloaded(const char *tgt_prefix)
 		closedir(dirp);
 
 	if (file_count >= tr2env_max_files) {
-		creat(sentinel_path.buf, 0666);
+		tr2_create_sentinel(dst, path.buf, sentinel_path.buf);
 		ret = 1;
 		goto cleanup;
 	}
@@ -126,7 +158,7 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
-	if (tr2_dst_overloaded(tgt_prefix)) {
+	if (tr2_dst_overloaded(dst, tgt_prefix)) {
 		strbuf_release(&path);
 		if (tr2_dst_want_warning())
 			warning("trace2: not opening %s trace file due to too "
diff --git a/trace2/tr2_dst.h b/trace2/tr2_dst.h
index 3adf3bac13..dd09a9541c 100644
--- a/trace2/tr2_dst.h
+++ b/trace2/tr2_dst.h
@@ -4,11 +4,14 @@
 struct strbuf;
 #include "trace2/tr2_sysenv.h"
 
+typedef void(tr2_dst_overload_writer_t)(const char *file, int line, const char *dir);
+
 struct tr2_dst {
 	enum tr2_sysenv_variable sysenv_var;
 	int fd;
 	unsigned int initialized : 1;
 	unsigned int need_close : 1;
+	tr2_dst_overload_writer_t *overload_writer;
 };
 
 /*
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index c2852d1bd2..68cb26fc67 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -10,7 +10,9 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0 };
+static void fn_overload_fl(const char *file, int line, const char *dir);
+
+static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, fn_overload_fl };
 
 /*
  * The version number of the JSON data generated by the EVENT target
@@ -19,7 +21,7 @@ static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0 };
  * to update this if you just add another call to one of the existing
  * TRACE2 API methods.
  */
-#define TR2_EVENT_VERSION "1"
+#define TR2_EVENT_VERSION "2"
 
 /*
  * Region nesting limit for messages written to the event target.
@@ -107,6 +109,21 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 		jw_object_intmax(jw, "repo", repo->trace2_repo_id);
 }
 
+static void fn_overload_fl(const char *file, int line, const char *dir)
+{
+	const char *event_name = "overload";
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	jw_object_string(&jw, "dir", dir);
+	jw_object_string(&jw, "evt", TR2_EVENT_VERSION);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 static void fn_version_fl(const char *file, int line)
 {
 	const char *event_name = "version";
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 00b116d797..ffca0d3811 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -9,7 +9,7 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_normal = { TR2_SYSENV_NORMAL, 0, 0, 0 };
+static struct tr2_dst tr2dst_normal = { TR2_SYSENV_NORMAL, 0, 0, 0, NULL };
 
 /*
  * Use the TR2_SYSENV_NORMAL_BRIEF setting to omit the "<time> <file>:<line>"
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index ea0cbbe13e..0a91e8a1f6 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -11,7 +11,7 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_perf = { TR2_SYSENV_PERF, 0, 0, 0 };
+static struct tr2_dst tr2dst_perf = { TR2_SYSENV_PERF, 0, 0, 0, NULL };
 
 /*
  * Use TR2_SYSENV_PERF_BRIEF to omit the "<time> <file>:<line>"
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-14  0:26   ` [RFC PATCH v3 3/3] trace2: write overload message to sentinel files Josh Steadmon
@ 2019-09-16 12:07     ` Derrick Stolee
  2019-09-16 14:11       ` Jeff Hostetler
  2019-09-16 18:07       ` Josh Steadmon
  0 siblings, 2 replies; 40+ messages in thread
From: Derrick Stolee @ 2019-09-16 12:07 UTC (permalink / raw)
  To: Josh Steadmon, git; +Cc: git, szeder.dev

On 9/13/2019 8:26 PM, Josh Steadmon wrote:
> Add a new "overload" event type for trace2 event destinations. Write
> this event into the sentinel file created by the trace2.maxFiles
> feature. Bump up the event format version since we've added a new event
> type.
> 
> Writing this message into the sentinel file is useful for tracking how
> often the overload protection feature is triggered in practice.

Putting meaningful data into the sentinel file is valuable. It's
important to know a bit about when and why this happened. A user
would be able to inspect the modified time, and the directory info
you include is unnecessary. The data you include is only for the
log aggregator to keep valuable data around overloads.
 
> +`"overload"`::
> +	This event is created in a sentinel file if we are overloading a target
> +	trace directory (see the trace2.maxFiles config option).
> ++
> +------------
> +{
> +	"event":"overload",
> +	...
> +	"dir":"/trace/target/dir/", # The configured trace2 target directory
> +	"evt":"2",		    # EVENT format version
> +}
That said, do we really need to resort to a new event format and
event type? Could we instead use the "data" event with a key
"overload" and put the target dir in the value?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-16 12:07     ` Derrick Stolee
@ 2019-09-16 14:11       ` Jeff Hostetler
  2019-09-16 18:20         ` Josh Steadmon
  2019-09-16 18:07       ` Josh Steadmon
  1 sibling, 1 reply; 40+ messages in thread
From: Jeff Hostetler @ 2019-09-16 14:11 UTC (permalink / raw)
  To: Derrick Stolee, Josh Steadmon, git; +Cc: szeder.dev



On 9/16/2019 8:07 AM, Derrick Stolee wrote:
> On 9/13/2019 8:26 PM, Josh Steadmon wrote:
>> Add a new "overload" event type for trace2 event destinations. Write
>> this event into the sentinel file created by the trace2.maxFiles
>> feature. Bump up the event format version since we've added a new event
>> type.
>>
>> Writing this message into the sentinel file is useful for tracking how
>> often the overload protection feature is triggered in practice.
> 
> Putting meaningful data into the sentinel file is valuable. It's
> important to know a bit about when and why this happened. A user
> would be able to inspect the modified time, and the directory info
> you include is unnecessary. The data you include is only for the
> log aggregator to keep valuable data around overloads.
>   
>> +`"overload"`::
>> +	This event is created in a sentinel file if we are overloading a target
>> +	trace directory (see the trace2.maxFiles config option).
>> ++
>> +------------
>> +{
>> +	"event":"overload",
>> +	...
>> +	"dir":"/trace/target/dir/", # The configured trace2 target directory
>> +	"evt":"2",		    # EVENT format version
>> +}
> That said, do we really need to resort to a new event format and
> event type? Could we instead use the "data" event with a key
> "overload" and put the target dir in the value?
> 
> Thanks,
> -Stolee
> 

If I understand the code here, the overload event/message is
only written to the sentinel file -- it is not written to a
regular trace2 log file, so regular log file consumers will
never see this event, right?

That message could be in any format, right?  And you could write
as much or as little data into the sentinel file as you want.

There's no compelling reason to extend the existing trace2 format
to have a new message type, so I'm not seeing a reason to add the
event-type nor to increment the version number.

The existing trace2 formats and messages/event-types are defined
and driven by the Trace2 API calls presented to upper layers
(consumers of the public trace2_*() functions and macros defined
in trace2.h).  This overload event doesn't fit that model.

I think it'd be better to just directly write() a message -- in
plain-text or JSON or whatever -- in tr2_create_sentinel() and
not try to piggy-back on the existing format machinery in the
tr2_tgt_*.c files.

Jeff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-16 12:07     ` Derrick Stolee
  2019-09-16 14:11       ` Jeff Hostetler
@ 2019-09-16 18:07       ` Josh Steadmon
  1 sibling, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-09-16 18:07 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, git, szeder.dev

On 2019.09.16 08:07, Derrick Stolee wrote:
> On 9/13/2019 8:26 PM, Josh Steadmon wrote:
> > Add a new "overload" event type for trace2 event destinations. Write
> > this event into the sentinel file created by the trace2.maxFiles
> > feature. Bump up the event format version since we've added a new event
> > type.
> > 
> > Writing this message into the sentinel file is useful for tracking how
> > often the overload protection feature is triggered in practice.
> 
> Putting meaningful data into the sentinel file is valuable. It's
> important to know a bit about when and why this happened. A user
> would be able to inspect the modified time, and the directory info
> you include is unnecessary. The data you include is only for the
> log aggregator to keep valuable data around overloads.
>  
> > +`"overload"`::
> > +	This event is created in a sentinel file if we are overloading a target
> > +	trace directory (see the trace2.maxFiles config option).
> > ++
> > +------------
> > +{
> > +	"event":"overload",
> > +	...
> > +	"dir":"/trace/target/dir/", # The configured trace2 target directory
> > +	"evt":"2",		    # EVENT format version
> > +}
> That said, do we really need to resort to a new event format and
> event type? Could we instead use the "data" event with a key
> "overload" and put the target dir in the value?

Yeah, that sounds reasonable. Thanks for the suggestion.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-16 14:11       ` Jeff Hostetler
@ 2019-09-16 18:20         ` Josh Steadmon
  2019-09-19 18:23           ` Jeff Hostetler
  0 siblings, 1 reply; 40+ messages in thread
From: Josh Steadmon @ 2019-09-16 18:20 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Derrick Stolee, git, szeder.dev

On 2019.09.16 10:11, Jeff Hostetler wrote:
> 
> 
> On 9/16/2019 8:07 AM, Derrick Stolee wrote:
> > On 9/13/2019 8:26 PM, Josh Steadmon wrote:
> > > Add a new "overload" event type for trace2 event destinations. Write
> > > this event into the sentinel file created by the trace2.maxFiles
> > > feature. Bump up the event format version since we've added a new event
> > > type.
> > > 
> > > Writing this message into the sentinel file is useful for tracking how
> > > often the overload protection feature is triggered in practice.
> > 
> > Putting meaningful data into the sentinel file is valuable. It's
> > important to know a bit about when and why this happened. A user
> > would be able to inspect the modified time, and the directory info
> > you include is unnecessary. The data you include is only for the
> > log aggregator to keep valuable data around overloads.
> > > +`"overload"`::
> > > +	This event is created in a sentinel file if we are overloading a target
> > > +	trace directory (see the trace2.maxFiles config option).
> > > ++
> > > +------------
> > > +{
> > > +	"event":"overload",
> > > +	...
> > > +	"dir":"/trace/target/dir/", # The configured trace2 target directory
> > > +	"evt":"2",		    # EVENT format version
> > > +}
> > That said, do we really need to resort to a new event format and
> > event type? Could we instead use the "data" event with a key
> > "overload" and put the target dir in the value?
> > 
> > Thanks,
> > -Stolee
> > 
> 
> If I understand the code here, the overload event/message is
> only written to the sentinel file -- it is not written to a
> regular trace2 log file, so regular log file consumers will
> never see this event, right?

Well, I guess it's hard to define what is a "regular log file consumer".
In our case, our collection system will treat sentinel files like any
other trace file, so it's useful to have it match the expected trace
format.

At least for our use, we don't want the sentinel files treated
specially, because we want the log collection system to just do its
thing and remove the file after processing, which lets Git know that
it's ok to start writing traces again.

> That message could be in any format, right?  And you could write
> as much or as little data into the sentinel file as you want.

To me it seems that it would be less surprising on the users' side if
any data written to the sentinel file matches the format of the
requested traces. If I have an automated process that's reading JSON
from a directory full of files, I don't want to have to add a special
case where one file might have perf-format data (or vice versa).

> There's no compelling reason to extend the existing trace2 format
> to have a new message type, so I'm not seeing a reason to add the
> event-type nor to increment the version number.
> 
> The existing trace2 formats and messages/event-types are defined
> and driven by the Trace2 API calls presented to upper layers
> (consumers of the public trace2_*() functions and macros defined
> in trace2.h).  This overload event doesn't fit that model.

Yeah, I did feel like this might be overkill. Do you think Stolee's
suggestion to use a "data" event instead would be acceptable?

> I think it'd be better to just directly write() a message -- in
> plain-text or JSON or whatever -- in tr2_create_sentinel() and
> not try to piggy-back on the existing format machinery in the
> tr2_tgt_*.c files.

I had a version that did this originally, but I don't really like having
an unexpected special case where we just write a static JSON string. It
feels like an ugly corner case, and would be surprising to users, IMO.
But if everyone thinks this is a better approach, I suppose I could just
add a switch statement in tr2_create_sentinel() that looks at the
sysenv_var field of the tr2_dst.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-16 18:20         ` Josh Steadmon
@ 2019-09-19 18:23           ` Jeff Hostetler
  2019-09-19 22:47             ` Josh Steadmon
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff Hostetler @ 2019-09-19 18:23 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee, git, szeder.dev



On 9/16/2019 2:20 PM, Josh Steadmon wrote:
> On 2019.09.16 10:11, Jeff Hostetler wrote:
>>
>>
>> On 9/16/2019 8:07 AM, Derrick Stolee wrote:
>>> On 9/13/2019 8:26 PM, Josh Steadmon wrote:
>>>> Add a new "overload" event type for trace2 event destinations. Write
>>>> this event into the sentinel file created by the trace2.maxFiles
>>>> feature. Bump up the event format version since we've added a new event
>>>> type.
>>>>
>>>> Writing this message into the sentinel file is useful for tracking how
>>>> often the overload protection feature is triggered in practice.
>>>
>>> Putting meaningful data into the sentinel file is valuable. It's
>>> important to know a bit about when and why this happened. A user
>>> would be able to inspect the modified time, and the directory info
>>> you include is unnecessary. The data you include is only for the
>>> log aggregator to keep valuable data around overloads.
>>>> +`"overload"`::
>>>> +	This event is created in a sentinel file if we are overloading a target
>>>> +	trace directory (see the trace2.maxFiles config option).
>>>> ++
>>>> +------------
>>>> +{
>>>> +	"event":"overload",
>>>> +	...
>>>> +	"dir":"/trace/target/dir/", # The configured trace2 target directory
>>>> +	"evt":"2",		    # EVENT format version
>>>> +}
>>> That said, do we really need to resort to a new event format and
>>> event type? Could we instead use the "data" event with a key
>>> "overload" and put the target dir in the value?
>>>
>>> Thanks,
>>> -Stolee
>>>
>>
>> If I understand the code here, the overload event/message is
>> only written to the sentinel file -- it is not written to a
>> regular trace2 log file, so regular log file consumers will
>> never see this event, right?
> 
> Well, I guess it's hard to define what is a "regular log file consumer".
> In our case, our collection system will treat sentinel files like any
> other trace file, so it's useful to have it match the expected trace
> format.
> 
> At least for our use, we don't want the sentinel files treated
> specially, because we want the log collection system to just do its
> thing and remove the file after processing, which lets Git know that
> it's ok to start writing traces again.
> 
>> That message could be in any format, right?  And you could write
>> as much or as little data into the sentinel file as you want.
> 
> To me it seems that it would be less surprising on the users' side if
> any data written to the sentinel file matches the format of the
> requested traces. If I have an automated process that's reading JSON
> from a directory full of files, I don't want to have to add a special
> case where one file might have perf-format data (or vice versa).
> 
>> There's no compelling reason to extend the existing trace2 format
>> to have a new message type, so I'm not seeing a reason to add the
>> event-type nor to increment the version number.
>>
>> The existing trace2 formats and messages/event-types are defined
>> and driven by the Trace2 API calls presented to upper layers
>> (consumers of the public trace2_*() functions and macros defined
>> in trace2.h).  This overload event doesn't fit that model.
> 
> Yeah, I did feel like this might be overkill. Do you think Stolee's
> suggestion to use a "data" event instead would be acceptable?
> 
>> I think it'd be better to just directly write() a message -- in
>> plain-text or JSON or whatever -- in tr2_create_sentinel() and
>> not try to piggy-back on the existing format machinery in the
>> tr2_tgt_*.c files.
> 
> I had a version that did this originally, but I don't really like having
> an unexpected special case where we just write a static JSON string. It
> feels like an ugly corner case, and would be surprising to users, IMO.
> But if everyone thinks this is a better approach, I suppose I could just
> add a switch statement in tr2_create_sentinel() that looks at the
> sysenv_var field of the tr2_dst.
> 


You make some good points.  I suppose it would be good to be able
to parse the overload file using the same reader/scheme as the
other events.  Well, at least for the JSON format; the other formats
don't really matter for your purposes anyway.

I am concerned that the new "overload" event will be the only event
in the file and therefore replace the "version" event in those files.
That is, we'll break the invariant that all JSON files begin with a
"version" event that containing the event version string.  That is,
in the current proposal, the format becomes:

     v2 ::= <overload> | <<v1>>
     v1 ::= <version> <start> ... <atexit>

V1 readers were promised that the first event in the file would
always be a <version> event.  And that they can dispatch on the
version.evt field.  V1 readers won't recognize the <overload> event
and they won't know to look at the overload.evt field.  That might
cause V1 parsers to throw a harder error than a simpler version
mismatch.

Just using a "data" event also feels wrong for the same reasons.
At that point in tr2_create_sentinel(), a new "data" event would
just give us:

     V2 ::= <data key="overload", value="true"> | <<v1>>
     v1 ::= <version> <start> ... <atexit>

Having said that I wonder if it would be better to have
tr2_create_sentinel() just set a flag (and leave the fd open).
And then either add the new event as:

     V2 ::= <version evt=2> <overload dir=path, max=n> | <<v1>>

or just add a column to the <version> event (and go ahead and
let the overload file be a full trace2 log from the command):

     V1 ::= <version evt=1, [overload=true]> <start> ... <atexit>

Does that make sense??

Jeff




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-19 18:23           ` Jeff Hostetler
@ 2019-09-19 22:47             ` Josh Steadmon
  2019-09-20 15:59               ` Jeff Hostetler
  0 siblings, 1 reply; 40+ messages in thread
From: Josh Steadmon @ 2019-09-19 22:47 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Derrick Stolee, git, szeder.dev

On 2019.09.19 14:23, Jeff Hostetler wrote:
> 
> 
> On 9/16/2019 2:20 PM, Josh Steadmon wrote:
> > On 2019.09.16 10:11, Jeff Hostetler wrote:
> > > 
> > > 
> > > On 9/16/2019 8:07 AM, Derrick Stolee wrote:
> > > > On 9/13/2019 8:26 PM, Josh Steadmon wrote:
> > > > > Add a new "overload" event type for trace2 event destinations. Write
> > > > > this event into the sentinel file created by the trace2.maxFiles
> > > > > feature. Bump up the event format version since we've added a new event
> > > > > type.
> > > > > 
> > > > > Writing this message into the sentinel file is useful for tracking how
> > > > > often the overload protection feature is triggered in practice.
> > > > 
> > > > Putting meaningful data into the sentinel file is valuable. It's
> > > > important to know a bit about when and why this happened. A user
> > > > would be able to inspect the modified time, and the directory info
> > > > you include is unnecessary. The data you include is only for the
> > > > log aggregator to keep valuable data around overloads.
> > > > > +`"overload"`::
> > > > > +	This event is created in a sentinel file if we are overloading a target
> > > > > +	trace directory (see the trace2.maxFiles config option).
> > > > > ++
> > > > > +------------
> > > > > +{
> > > > > +	"event":"overload",
> > > > > +	...
> > > > > +	"dir":"/trace/target/dir/", # The configured trace2 target directory
> > > > > +	"evt":"2",		    # EVENT format version
> > > > > +}
> > > > That said, do we really need to resort to a new event format and
> > > > event type? Could we instead use the "data" event with a key
> > > > "overload" and put the target dir in the value?
> > > > 
> > > > Thanks,
> > > > -Stolee
> > > > 
> > > 
> > > If I understand the code here, the overload event/message is
> > > only written to the sentinel file -- it is not written to a
> > > regular trace2 log file, so regular log file consumers will
> > > never see this event, right?
> > 
> > Well, I guess it's hard to define what is a "regular log file consumer".
> > In our case, our collection system will treat sentinel files like any
> > other trace file, so it's useful to have it match the expected trace
> > format.
> > 
> > At least for our use, we don't want the sentinel files treated
> > specially, because we want the log collection system to just do its
> > thing and remove the file after processing, which lets Git know that
> > it's ok to start writing traces again.
> > 
> > > That message could be in any format, right?  And you could write
> > > as much or as little data into the sentinel file as you want.
> > 
> > To me it seems that it would be less surprising on the users' side if
> > any data written to the sentinel file matches the format of the
> > requested traces. If I have an automated process that's reading JSON
> > from a directory full of files, I don't want to have to add a special
> > case where one file might have perf-format data (or vice versa).
> > 
> > > There's no compelling reason to extend the existing trace2 format
> > > to have a new message type, so I'm not seeing a reason to add the
> > > event-type nor to increment the version number.
> > > 
> > > The existing trace2 formats and messages/event-types are defined
> > > and driven by the Trace2 API calls presented to upper layers
> > > (consumers of the public trace2_*() functions and macros defined
> > > in trace2.h).  This overload event doesn't fit that model.
> > 
> > Yeah, I did feel like this might be overkill. Do you think Stolee's
> > suggestion to use a "data" event instead would be acceptable?
> > 
> > > I think it'd be better to just directly write() a message -- in
> > > plain-text or JSON or whatever -- in tr2_create_sentinel() and
> > > not try to piggy-back on the existing format machinery in the
> > > tr2_tgt_*.c files.
> > 
> > I had a version that did this originally, but I don't really like having
> > an unexpected special case where we just write a static JSON string. It
> > feels like an ugly corner case, and would be surprising to users, IMO.
> > But if everyone thinks this is a better approach, I suppose I could just
> > add a switch statement in tr2_create_sentinel() that looks at the
> > sysenv_var field of the tr2_dst.
> > 
> 
> 
> You make some good points.  I suppose it would be good to be able
> to parse the overload file using the same reader/scheme as the
> other events.  Well, at least for the JSON format; the other formats
> don't really matter for your purposes anyway.
> 
> I am concerned that the new "overload" event will be the only event
> in the file and therefore replace the "version" event in those files.
> That is, we'll break the invariant that all JSON files begin with a
> "version" event that containing the event version string.  That is,
> in the current proposal, the format becomes:
> 
>     v2 ::= <overload> | <<v1>>
>     v1 ::= <version> <start> ... <atexit>
> 
> V1 readers were promised that the first event in the file would
> always be a <version> event.

Ah interesting, I wasn't aware of this. Thanks for clarifying. I do see
that there's a recommendation that trace2_initialize() should be called
"as early as possible", but perhaps I should add a note to the docs?


> And that they can dispatch on the
> version.evt field.  V1 readers won't recognize the <overload> event
> and they won't know to look at the overload.evt field.  That might
> cause V1 parsers to throw a harder error than a simpler version
> mismatch.
> 
> Just using a "data" event also feels wrong for the same reasons.
> At that point in tr2_create_sentinel(), a new "data" event would
> just give us:
> 
>     V2 ::= <data key="overload", value="true"> | <<v1>>
>     v1 ::= <version> <start> ... <atexit>
> 
> Having said that I wonder if it would be better to have
> tr2_create_sentinel() just set a flag (and leave the fd open).
> And then either add the new event as:
> 
>     V2 ::= <version evt=2> <overload dir=path, max=n> | <<v1>>
> 
> or just add a column to the <version> event (and go ahead and
> let the overload file be a full trace2 log from the command):
> 
>     V1 ::= <version evt=1, [overload=true]> <start> ... <atexit>
> 
> Does that make sense??

Yeah, that seems reasonable. We're already adding one more file to the
target directory, so we might as well include the traces from this run.

However, you're still keeping evt=1 here; do we make any promises about
fields within events being constant in a particular event version? After
scanning api-trace2.txt, I don't see any explicit description of what
may or may not change within a version, and the idea that we might add
new fields would be surprising at least to me. I'm fine either way so
long as it's documented. Would a section like the following in
api-trace2.txt be acceptable?

"""
Adding new fields to events does not require a change in the EVT.
Removing fields, or adding or removing event types does require an
increment to the EVT.
"""

> 
> Jeff


Thanks for the review,
Josh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH v3 3/3] trace2: write overload message to sentinel files
  2019-09-19 22:47             ` Josh Steadmon
@ 2019-09-20 15:59               ` Jeff Hostetler
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Hostetler @ 2019-09-20 15:59 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee, git, szeder.dev



On 9/19/2019 6:47 PM, Josh Steadmon wrote:
> On 2019.09.19 14:23, Jeff Hostetler wrote:
>>
>>
>> On 9/16/2019 2:20 PM, Josh Steadmon wrote:
>>> On 2019.09.16 10:11, Jeff Hostetler wrote:
>>>>
>>>>
>>>> On 9/16/2019 8:07 AM, Derrick Stolee wrote:
>>>>> On 9/13/2019 8:26 PM, Josh Steadmon wrote:
>>>>>> Add a new "overload" event type for trace2 event destinations. Write
>>>>>> this event into the sentinel file created by the trace2.maxFiles
>>>>>> feature. Bump up the event format version since we've added a new event
>>>>>> type.
>>>>>>
>>>>>> Writing this message into the sentinel file is useful for tracking how
>>>>>> often the overload protection feature is triggered in practice.
>>>>>
>>>>> Putting meaningful data into the sentinel file is valuable. It's
>>>>> important to know a bit about when and why this happened. A user
>>>>> would be able to inspect the modified time, and the directory info
>>>>> you include is unnecessary. The data you include is only for the
>>>>> log aggregator to keep valuable data around overloads.
>>>>>> +`"overload"`::
>>>>>> +	This event is created in a sentinel file if we are overloading a target
>>>>>> +	trace directory (see the trace2.maxFiles config option).
>>>>>> ++
>>>>>> +------------
>>>>>> +{
>>>>>> +	"event":"overload",
>>>>>> +	...
>>>>>> +	"dir":"/trace/target/dir/", # The configured trace2 target directory
>>>>>> +	"evt":"2",		    # EVENT format version
>>>>>> +}
>>>>> That said, do we really need to resort to a new event format and
>>>>> event type? Could we instead use the "data" event with a key
>>>>> "overload" and put the target dir in the value?
>>>>>
>>>>> Thanks,
>>>>> -Stolee
>>>>>
>>>>
>>>> If I understand the code here, the overload event/message is
>>>> only written to the sentinel file -- it is not written to a
>>>> regular trace2 log file, so regular log file consumers will
>>>> never see this event, right?
>>>
>>> Well, I guess it's hard to define what is a "regular log file consumer".
>>> In our case, our collection system will treat sentinel files like any
>>> other trace file, so it's useful to have it match the expected trace
>>> format.
>>>
>>> At least for our use, we don't want the sentinel files treated
>>> specially, because we want the log collection system to just do its
>>> thing and remove the file after processing, which lets Git know that
>>> it's ok to start writing traces again.
>>>
>>>> That message could be in any format, right?  And you could write
>>>> as much or as little data into the sentinel file as you want.
>>>
>>> To me it seems that it would be less surprising on the users' side if
>>> any data written to the sentinel file matches the format of the
>>> requested traces. If I have an automated process that's reading JSON
>>> from a directory full of files, I don't want to have to add a special
>>> case where one file might have perf-format data (or vice versa).
>>>
>>>> There's no compelling reason to extend the existing trace2 format
>>>> to have a new message type, so I'm not seeing a reason to add the
>>>> event-type nor to increment the version number.
>>>>
>>>> The existing trace2 formats and messages/event-types are defined
>>>> and driven by the Trace2 API calls presented to upper layers
>>>> (consumers of the public trace2_*() functions and macros defined
>>>> in trace2.h).  This overload event doesn't fit that model.
>>>
>>> Yeah, I did feel like this might be overkill. Do you think Stolee's
>>> suggestion to use a "data" event instead would be acceptable?
>>>
>>>> I think it'd be better to just directly write() a message -- in
>>>> plain-text or JSON or whatever -- in tr2_create_sentinel() and
>>>> not try to piggy-back on the existing format machinery in the
>>>> tr2_tgt_*.c files.
>>>
>>> I had a version that did this originally, but I don't really like having
>>> an unexpected special case where we just write a static JSON string. It
>>> feels like an ugly corner case, and would be surprising to users, IMO.
>>> But if everyone thinks this is a better approach, I suppose I could just
>>> add a switch statement in tr2_create_sentinel() that looks at the
>>> sysenv_var field of the tr2_dst.
>>>
>>
>>
>> You make some good points.  I suppose it would be good to be able
>> to parse the overload file using the same reader/scheme as the
>> other events.  Well, at least for the JSON format; the other formats
>> don't really matter for your purposes anyway.
>>
>> I am concerned that the new "overload" event will be the only event
>> in the file and therefore replace the "version" event in those files.
>> That is, we'll break the invariant that all JSON files begin with a
>> "version" event that containing the event version string.  That is,
>> in the current proposal, the format becomes:
>>
>>      v2 ::= <overload> | <<v1>>
>>      v1 ::= <version> <start> ... <atexit>
>>
>> V1 readers were promised that the first event in the file would
>> always be a <version> event.
> 
> Ah interesting, I wasn't aware of this. Thanks for clarifying. I do see
> that there's a recommendation that trace2_initialize() should be called
> "as early as possible", but perhaps I should add a note to the docs?
> 
> 
>> And that they can dispatch on the
>> version.evt field.  V1 readers won't recognize the <overload> event
>> and they won't know to look at the overload.evt field.  That might
>> cause V1 parsers to throw a harder error than a simpler version
>> mismatch.
>>
>> Just using a "data" event also feels wrong for the same reasons.
>> At that point in tr2_create_sentinel(), a new "data" event would
>> just give us:
>>
>>      V2 ::= <data key="overload", value="true"> | <<v1>>
>>      v1 ::= <version> <start> ... <atexit>
>>
>> Having said that I wonder if it would be better to have
>> tr2_create_sentinel() just set a flag (and leave the fd open).
>> And then either add the new event as:
>>
>>      V2 ::= <version evt=2> <overload dir=path, max=n> | <<v1>>
>>
>> or just add a column to the <version> event (and go ahead and
>> let the overload file be a full trace2 log from the command):
>>
>>      V1 ::= <version evt=1, [overload=true]> <start> ... <atexit>
>>
>> Does that make sense??
> 
> Yeah, that seems reasonable. We're already adding one more file to the
> target directory, so we might as well include the traces from this run.
> 
> However, you're still keeping evt=1 here; do we make any promises about
> fields within events being constant in a particular event version? After
> scanning api-trace2.txt, I don't see any explicit description of what
> may or may not change within a version, and the idea that we might add
> new fields would be surprising at least to me. I'm fine either way so
> long as it's documented. Would a section like the following in
> api-trace2.txt be acceptable?
> 
> """
> Adding new fields to events does not require a change in the EVT.
> Removing fields, or adding or removing event types does require an
> increment to the EVT.
> """

Um, good point.  Adding fields to an existing event is probably
safe -- unless it conveys a more significant change in behavior.
Something, for example, that changes the interpretation of existing
events or fields would require a version change.

I could go either way on this.  Perhaps it would be better to just
update the version number and add the event.  (Sorry to be wishy-washy
on this.)

     json ::=   <version evt=1> <<body>>
              | <version evt=2> [<overload>] [<<body>>]
     body ::= <start> ... <atexit>

That has a bit of a cleaner feel (in hindsight).

Jeff




^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 0/4] trace2: don't overload target directories
  2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
                   ` (4 preceding siblings ...)
  2019-09-14  0:25 ` [RFC PATCH v3 0/3] " Josh Steadmon
@ 2019-10-03 23:32 ` Josh Steadmon
  2019-10-03 23:32   ` [PATCH v4 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
                     ` (3 more replies)
  2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
  6 siblings, 4 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-03 23:32 UTC (permalink / raw)
  To: git, git; +Cc: stolee

Apologies for the delayed reply. I've added a new documentation patch
describing what we expect w.r.t. the version event and its "evt" field.
I've also reworked the final patch to allow writing a full trace session
to the sentinel file, and to make sure that the overload event comes
after the version event.

I am still incrementing the EVT and writing a new "overload" event,
since you said in the last review that this approache seems best to you.
But I am also happy to keep the EVT=1 and make "overload" a new field in
the "version" event if you've changed your mind.

Josh Steadmon (4):
  docs: mention trace2 target-dir mode in git-config
  docs: clarify trace2 version invariants
  trace2: don't overload target directories
  trace2: write overload message to sentinel files

 Documentation/config/trace2.txt        |   6 ++
 Documentation/technical/api-trace2.txt |  30 +++++--
 Documentation/trace2-target-values.txt |   4 +-
 t/t0212-trace2-event.sh                |  19 +++++
 trace2/tr2_dst.c                       | 113 ++++++++++++++++++++++---
 trace2/tr2_dst.h                       |   1 +
 trace2/tr2_sysenv.c                    |   3 +
 trace2/tr2_sysenv.h                    |   2 +
 trace2/tr2_tgt_event.c                 |  31 +++++--
 trace2/tr2_tgt_normal.c                |   2 +-
 trace2/tr2_tgt_perf.c                  |   2 +-
 11 files changed, 185 insertions(+), 28 deletions(-)

Range-diff against v3:
-:  ---------- > 1:  a757bca8f9 docs: clarify trace2 version invariants
1:  bf20ec8ea2 ! 2:  98a8440d3f trace2: don't overload target directories
    @@ Commit message
         The config can also be overridden with a new environment variable:
         GIT_TRACE2_MAX_FILES.
     
    -    Potential future work:
    -    * Write a message into the sentinel file (should match the requested
    -      trace2 output format).
    -    * Add a performance test to make sure that contention between multiple
    -      processes all writing to the same target directory does not become an
    -      issue.
    -
     
      ## Documentation/config/trace2.txt ##
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
     +	struct stat statbuf;
     +
    -+	strbuf_addstr(&path, tgt_prefix);
    -+	if (!is_dir_sep(path.buf[path.len - 1])) {
    -+		strbuf_addch(&path, '/');
    -+	}
    -+
     +	/* Get the config or envvar and decide if we should continue this check */
     +	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
     +	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     +		goto cleanup;
     +	}
     +
    ++	strbuf_addstr(&path, tgt_prefix);
    ++	if (!is_dir_sep(path.buf[path.len - 1])) {
    ++		strbuf_addch(&path, '/');
    ++	}
    ++
     +	/* check sentinel */
     +	strbuf_addbuf(&sentinel_path, &path);
     +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
2:  bab45cb735 < -:  ---------- trace2: write overload message to sentinel files
-:  ---------- > 3:  790c5ac95a trace2: write overload message to sentinel files
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 1/4] docs: mention trace2 target-dir mode in git-config
  2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
@ 2019-10-03 23:32   ` Josh Steadmon
  2019-10-03 23:32   ` [PATCH v4 2/4] docs: clarify trace2 version invariants Josh Steadmon
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-03 23:32 UTC (permalink / raw)
  To: git, git; +Cc: stolee

Move the description of trace2's target-directory behavior into the
shared trace2-target-values file so that it is included in both the
git-config and api-trace2 docs. Leave the SID discussion only in
api-trace2 since it's a technical detail.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 7 +++----
 Documentation/trace2-target-values.txt | 4 +++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 71eb081fed..80ffceada0 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -142,10 +142,9 @@ system or global config value to one of the following:
 
 include::../trace2-target-values.txt[]
 
-If the target already exists and is a directory, the traces will be
-written to files (one per process) underneath the given directory. They
-will be named according to the last component of the SID (optionally
-followed by a counter to avoid filename collisions).
+When trace files are written to a target directory, they will be named according
+to the last component of the SID (optionally followed by a counter to avoid
+filename collisions).
 
 == Trace2 API
 
diff --git a/Documentation/trace2-target-values.txt b/Documentation/trace2-target-values.txt
index 27d3c64e66..3985b6d3c2 100644
--- a/Documentation/trace2-target-values.txt
+++ b/Documentation/trace2-target-values.txt
@@ -2,7 +2,9 @@
 * `0` or `false` - Disables the target.
 * `1` or `true` - Writes to `STDERR`.
 * `[2-9]` - Writes to the already opened file descriptor.
-* `<absolute-pathname>` - Writes to the file in append mode.
+* `<absolute-pathname>` - Writes to the file in append mode. If the target
+already exists and is a directory, the traces will be written to files (one
+per process) underneath the given directory.
 * `af_unix:[<socket_type>:]<absolute-pathname>` - Write to a
 Unix DomainSocket (on platforms that support them).  Socket
 type can be either `stream` or `dgram`; if omitted Git will
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 2/4] docs: clarify trace2 version invariants
  2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
  2019-10-03 23:32   ` [PATCH v4 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
@ 2019-10-03 23:32   ` Josh Steadmon
  2019-10-03 23:32   ` [PATCH v4 3/4] trace2: don't overload target directories Josh Steadmon
  2019-10-03 23:32   ` [PATCH v4 4/4] trace2: write overload message to sentinel files Josh Steadmon
  3 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-03 23:32 UTC (permalink / raw)
  To: git, git; +Cc: stolee

Make it explicit that we always want trace2 "version" events to be the
first event of any trace session. Also list the changes that would or
would not cause the EVENT format version field to be incremented.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 80ffceada0..18b79128fd 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -604,7 +604,13 @@ only present on the "start" and "atexit" events.
 ==== Event-Specific Key/Value Pairs
 
 `"version"`::
-	This event gives the version of the executable and the EVENT format.
+	This event gives the version of the executable and the EVENT format. It
+	should always be the first event in a trace session. The EVENT format
+	version will be incremented if new event types are added, if existing
+	fields are removed, or if there are significant changes in
+	interpretation of existing events or fields. Smaller changes, such as
+	adding a new field to an existing event, will not require an increment
+	to the EVENT format version.
 +
 ------------
 {
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 3/4] trace2: don't overload target directories
  2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
  2019-10-03 23:32   ` [PATCH v4 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
  2019-10-03 23:32   ` [PATCH v4 2/4] docs: clarify trace2 version invariants Josh Steadmon
@ 2019-10-03 23:32   ` Josh Steadmon
  2019-10-04  0:25     ` Junio C Hamano
  2019-10-04  9:12     ` Johannes Schindelin
  2019-10-03 23:32   ` [PATCH v4 4/4] trace2: write overload message to sentinel files Josh Steadmon
  3 siblings, 2 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-03 23:32 UTC (permalink / raw)
  To: git, git; +Cc: stolee

trace2 can write files into a target directory. With heavy usage, this
directory can fill up with files, causing difficulty for
trace-processing systems.

This patch adds a config option (trace2.maxFiles) to set a maximum
number of files that trace2 will write to a target directory. The
following behavior is enabled when the maxFiles is set to a positive
integer:
  When trace2 would write a file to a target directory, first check
  whether or not the directory is overloaded. A directory is overloaded
  if there is a sentinel file declaring an overload, or if the number of
  files exceeds trace2.maxFiles. If the latter, create a sentinel file
  to speed up later overload checks.

The assumption is that a separate trace-processing system is dealing
with the generated traces; once it processes and removes the sentinel
file, it should be safe to generate new trace files again.

The default value for trace2.maxFiles is zero, which disables the
overload check.

The config can also be overridden with a new environment variable:
GIT_TRACE2_MAX_FILES.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/config/trace2.txt |  6 +++
 t/t0212-trace2-event.sh         | 17 +++++++
 trace2/tr2_dst.c                | 86 +++++++++++++++++++++++++++++++++
 trace2/tr2_sysenv.c             |  3 ++
 trace2/tr2_sysenv.h             |  2 +
 5 files changed, 114 insertions(+)

diff --git a/Documentation/config/trace2.txt b/Documentation/config/trace2.txt
index 2edbfb02fe..4ce0b9a6d1 100644
--- a/Documentation/config/trace2.txt
+++ b/Documentation/config/trace2.txt
@@ -54,3 +54,9 @@ trace2.destinationDebug::
 	By default, these errors are suppressed and tracing is
 	silently disabled.  May be overridden by the
 	`GIT_TRACE2_DST_DEBUG` environment variable.
+
+trace2.maxFiles::
+	Integer.  When writing trace files to a target directory, do not
+	write additional traces if we would exceed this many files. Instead,
+	write a sentinel file that will block further tracing to this
+	directory. Defaults to 0, which disables this check.
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index ff5b9cc729..2ff97e72da 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -265,4 +265,21 @@ test_expect_success JSON_PP 'using global config, event stream, error event' '
 	test_cmp expect actual
 '
 
+test_expect_success "don't overload target directory" '
+	mkdir trace_target_dir &&
+	test_when_finished "rm -r trace_target_dir" &&
+	(
+		GIT_TRACE2_MAX_FILES=5 &&
+		export GIT_TRACE2_MAX_FILES &&
+		cd trace_target_dir &&
+		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
+		xargs touch <../expected_filenames.txt &&
+		cd .. &&
+		GIT_TRACE2_EVENT="$(pwd)/trace_target_dir" test-tool trace2 001return 0
+	) &&
+	echo git-trace2-overload >>expected_filenames.txt &&
+	ls trace_target_dir >ls_output.txt &&
+	test_cmp expected_filenames.txt ls_output.txt
+'
+
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 5dda0ca1cd..af3405f179 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,3 +1,5 @@
+#include <dirent.h>
+
 #include "cache.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
@@ -8,6 +10,19 @@
  */
 #define MAX_AUTO_ATTEMPTS 10
 
+/*
+ * Sentinel file used to detect when we're overloading a directory with too many
+ * trace files.
+ */
+#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
+
+/*
+ * When set to zero, disables directory overload checking. Otherwise, controls
+ * how many files we can write to a directory before entering overload mode.
+ * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
+ */
+static int tr2env_max_files = 0;
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+/*
+ * Check to make sure we're not overloading the target directory with too many
+ * files. First get the threshold (if present) from the config or envvar. If
+ * it's zero or unset, disable this check.  Next check for the presence of a
+ * sentinel file, then check file count. If we are overloaded, create the
+ * sentinel file if it doesn't already exist.
+ *
+ * We expect that some trace processing system is gradually collecting files
+ * from the target directory; after it removes the sentinel file we'll start
+ * writing traces again.
+ */
+static int tr2_dst_overloaded(const char *tgt_prefix)
+{
+	int file_count = 0, max_files = 0, ret = 0;
+	const char *max_files_var;
+	DIR *dirp;
+	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
+	struct stat statbuf;
+
+	/* Get the config or envvar and decide if we should continue this check */
+	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
+	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
+		tr2env_max_files = max_files;
+
+	if (!tr2env_max_files) {
+		ret = 0;
+		goto cleanup;
+	}
+
+	strbuf_addstr(&path, tgt_prefix);
+	if (!is_dir_sep(path.buf[path.len - 1])) {
+		strbuf_addch(&path, '/');
+	}
+
+	/* check sentinel */
+	strbuf_addbuf(&sentinel_path, &path);
+	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
+	if (!stat(sentinel_path.buf, &statbuf)) {
+		ret = 1;
+		goto cleanup;
+	}
+
+	/* check file count */
+	dirp = opendir(path.buf);
+	while (file_count < tr2env_max_files && dirp && readdir(dirp))
+		file_count++;
+	if (dirp)
+		closedir(dirp);
+
+	if (file_count >= tr2env_max_files) {
+		creat(sentinel_path.buf, 0666);
+		ret = 1;
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&path);
+	strbuf_release(&sentinel_path);
+	return ret;
+}
+
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int fd;
@@ -50,6 +126,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
+	if (tr2_dst_overloaded(tgt_prefix)) {
+		strbuf_release(&path);
+		if (tr2_dst_want_warning())
+			warning("trace2: not opening %s trace file due to too "
+				"many files in target directory %s",
+				tr2_sysenv_display_name(dst->sysenv_var),
+				tgt_prefix);
+		return 0;
+	}
+
 	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
 		if (attempt_count > 0) {
 			strbuf_setlen(&path, base_path_len);
diff --git a/trace2/tr2_sysenv.c b/trace2/tr2_sysenv.c
index 5958cfc424..3c3792eca2 100644
--- a/trace2/tr2_sysenv.c
+++ b/trace2/tr2_sysenv.c
@@ -49,6 +49,9 @@ static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
 				       "trace2.perftarget" },
 	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
 				       "trace2.perfbrief" },
+
+	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
+				       "trace2.maxfiles" },
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_sysenv.h b/trace2/tr2_sysenv.h
index 8dd82a7a56..d4364a7b85 100644
--- a/trace2/tr2_sysenv.h
+++ b/trace2/tr2_sysenv.h
@@ -24,6 +24,8 @@ enum tr2_sysenv_variable {
 	TR2_SYSENV_PERF,
 	TR2_SYSENV_PERF_BRIEF,
 
+	TR2_SYSENV_MAX_FILES,
+
 	TR2_SYSENV_MUST_BE_LAST
 };
 
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 4/4] trace2: write overload message to sentinel files
  2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
                     ` (2 preceding siblings ...)
  2019-10-03 23:32   ` [PATCH v4 3/4] trace2: don't overload target directories Josh Steadmon
@ 2019-10-03 23:32   ` Josh Steadmon
  3 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-03 23:32 UTC (permalink / raw)
  To: git, git; +Cc: stolee

Add a new "overload" event type for trace2 event destinations. When the
trace2 overload feature creates a sentinel file, it will include the
normal trace2 output in the sentinel, along with this new overload
event.

Writing this message into the sentinel file is useful for tracking how
often the overload protection feature is triggered in practice.

Bump up the event format version since we've added a new event type.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 15 ++++++--
 t/t0212-trace2-event.sh                |  4 ++-
 trace2/tr2_dst.c                       | 47 ++++++++++++++------------
 trace2/tr2_dst.h                       |  1 +
 trace2/tr2_tgt_event.c                 | 31 +++++++++++++----
 trace2/tr2_tgt_normal.c                |  2 +-
 trace2/tr2_tgt_perf.c                  |  2 +-
 7 files changed, 69 insertions(+), 33 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 18b79128fd..5afd643fa6 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -128,7 +128,7 @@ yields
 
 ------------
 $ cat ~/log.event
-{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"}
+{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"2","exe":"2.20.1.155.g426c96fcdb"}
 {"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]}
 {"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"}
 {"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0}
@@ -616,11 +616,22 @@ only present on the "start" and "atexit" events.
 {
 	"event":"version",
 	...
-	"evt":"1",		       # EVENT format version
+	"evt":"2",		       # EVENT format version
 	"exe":"2.20.1.155.g426c96fcdb" # git version
 }
 ------------
 
+`"overload"`::
+	This event is created in a sentinel file if we are overloading a target
+	trace directory (see the trace2.maxFiles config option).
++
+------------
+{
+	"event":"overload",
+	...
+}
+------------
+
 `"start"`::
 	This event contains the complete argv received by main().
 +
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index 2ff97e72da..7081153a11 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -279,7 +279,9 @@ test_expect_success "don't overload target directory" '
 	) &&
 	echo git-trace2-overload >>expected_filenames.txt &&
 	ls trace_target_dir >ls_output.txt &&
-	test_cmp expected_filenames.txt ls_output.txt
+	test_cmp expected_filenames.txt ls_output.txt &&
+	head -n1 trace_target_dir/git-trace2-overload | grep \"event\":\"version\" &&
+	head -n2 trace_target_dir/git-trace2-overload | tail -n1 | grep \"event\":\"overload\"
 '
 
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index af3405f179..eedc5a332f 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -50,15 +50,19 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 /*
  * Check to make sure we're not overloading the target directory with too many
  * files. First get the threshold (if present) from the config or envvar. If
- * it's zero or unset, disable this check.  Next check for the presence of a
- * sentinel file, then check file count. If we are overloaded, create the
- * sentinel file if it doesn't already exist.
+ * it's zero or unset, disable this check. Next check for the presence of a
+ * sentinel file, then check file count.
+ *
+ * Returns 0 if tracing should proceed as normal. Returns 1 if the sentinel file
+ * already exists, which means tracing should be disabled. Returns -1 if we are
+ * overloaded but there was no sentinel file, which means we have created and
+ * should write traces to the sentinel file.
  *
  * We expect that some trace processing system is gradually collecting files
  * from the target directory; after it removes the sentinel file we'll start
  * writing traces again.
  */
-static int tr2_dst_overloaded(const char *tgt_prefix)
+static int tr2_dst_overloaded(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int file_count = 0, max_files = 0, ret = 0;
 	const char *max_files_var;
@@ -97,8 +101,9 @@ static int tr2_dst_overloaded(const char *tgt_prefix)
 		closedir(dirp);
 
 	if (file_count >= tr2env_max_files) {
-		creat(sentinel_path.buf, 0666);
-		ret = 1;
+		dst->overloaded = 1;
+		dst->fd = open(sentinel_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
+		ret = -1;
 		goto cleanup;
 	}
 
@@ -110,7 +115,7 @@ static int tr2_dst_overloaded(const char *tgt_prefix)
 
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
-	int fd;
+	int overloaded;
 	const char *last_slash, *sid = tr2_sid_get();
 	struct strbuf path = STRBUF_INIT;
 	size_t base_path_len;
@@ -126,7 +131,19 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
-	if (tr2_dst_overloaded(tgt_prefix)) {
+	overloaded = tr2_dst_overloaded(dst, tgt_prefix);
+	if (!overloaded) {
+		for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
+			if (attempt_count > 0) {
+				strbuf_setlen(&path, base_path_len);
+				strbuf_addf(&path, ".%d", attempt_count);
+			}
+
+			dst->fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
+			if (dst->fd != -1)
+				break;
+		}
+	} else if (overloaded == 1) {
 		strbuf_release(&path);
 		if (tr2_dst_want_warning())
 			warning("trace2: not opening %s trace file due to too "
@@ -136,18 +153,7 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 		return 0;
 	}
 
-	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
-		if (attempt_count > 0) {
-			strbuf_setlen(&path, base_path_len);
-			strbuf_addf(&path, ".%d", attempt_count);
-		}
-
-		fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
-		if (fd != -1)
-			break;
-	}
-
-	if (fd == -1) {
+	if (dst->fd == -1) {
 		if (tr2_dst_want_warning())
 			warning("trace2: could not open '%.*s' for '%s' tracing: %s",
 				(int) base_path_len, path.buf,
@@ -161,7 +167,6 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 
 	strbuf_release(&path);
 
-	dst->fd = fd;
 	dst->need_close = 1;
 	dst->initialized = 1;
 
diff --git a/trace2/tr2_dst.h b/trace2/tr2_dst.h
index 3adf3bac13..e8abe1e490 100644
--- a/trace2/tr2_dst.h
+++ b/trace2/tr2_dst.h
@@ -9,6 +9,7 @@ struct tr2_dst {
 	int fd;
 	unsigned int initialized : 1;
 	unsigned int need_close : 1;
+	unsigned int overloaded : 1;
 };
 
 /*
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index c2852d1bd2..f27bc572c5 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -10,16 +10,17 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0 };
+static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
 
 /*
- * The version number of the JSON data generated by the EVENT target
- * in this source file.  Update this if you make a significant change
- * to the JSON fields or message structure.  You probably do not need
- * to update this if you just add another call to one of the existing
- * TRACE2 API methods.
+ * The version number of the JSON data generated by the EVENT target in this
+ * source file. The version should be incremented if new event types are added,
+ * if existing fields are removed, or if there are significant changes in
+ * interpretation of existing events or fields. Smaller changes, such as adding
+ * a new field to an existing event, do not require an increment to the EVENT
+ * format version.
  */
-#define TR2_EVENT_VERSION "1"
+#define TR2_EVENT_VERSION "2"
 
 /*
  * Region nesting limit for messages written to the event target.
@@ -107,6 +108,19 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 		jw_object_intmax(jw, "repo", repo->trace2_repo_id);
 }
 
+static void fn_overload_fl(const char *file, int line)
+{
+	const char *event_name = "overload";
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 static void fn_version_fl(const char *file, int line)
 {
 	const char *event_name = "version";
@@ -120,6 +134,9 @@ static void fn_version_fl(const char *file, int line)
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
 	jw_release(&jw);
+
+	if (tr2dst_event.overloaded)
+		fn_overload_fl(file, line);
 }
 
 static void fn_start_fl(const char *file, int line,
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 00b116d797..e91633468d 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -9,7 +9,7 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_normal = { TR2_SYSENV_NORMAL, 0, 0, 0 };
+static struct tr2_dst tr2dst_normal = { TR2_SYSENV_NORMAL, 0, 0, 0, 0 };
 
 /*
  * Use the TR2_SYSENV_NORMAL_BRIEF setting to omit the "<time> <file>:<line>"
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index ea0cbbe13e..bebac4782d 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -11,7 +11,7 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_perf = { TR2_SYSENV_PERF, 0, 0, 0 };
+static struct tr2_dst tr2dst_perf = { TR2_SYSENV_PERF, 0, 0, 0, 0 };
 
 /*
  * Use TR2_SYSENV_PERF_BRIEF to omit the "<time> <file>:<line>"
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] trace2: don't overload target directories
  2019-10-03 23:32   ` [PATCH v4 3/4] trace2: don't overload target directories Josh Steadmon
@ 2019-10-04  0:25     ` Junio C Hamano
  2019-10-04 21:57       ` Josh Steadmon
  2019-10-04  9:12     ` Johannes Schindelin
  1 sibling, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2019-10-04  0:25 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, git, stolee

Josh Steadmon <steadmon@google.com> writes:

> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.

Sorry for not mentioning this, but "don't overload" is a suboptimal
keyword for the entire topic and for this particular step for a few
reasons.  For one, "overload" is an overloaded verb that gives an
incorrect impression that the problem you are dealing with is that
the target directory you specify is (mis)used for other purposes,
which is not the case.  You instead refrain from creating too many
files.  The other (which is probably more serious) is that it is
unclear what approach you chose to solve the "directory ends up
holding too many files".  One could simply discard new traces to do
so, one could concatenate to existing files to avoid creating new
files, one could even cycle the directory (i.e. path/to/log may
become path/to/log.old.1 and path/to/log is recreated as an empty
directory when it gets a new file).

    trace2: discard new traces when a target directory has too many files

or something would convey the problem you are solving (i.e. "too
many files" implying negative performance and usability impact
coming from it) and solution (i.e. "discard new traces"), if it is
the approach you have chosen.

> +	/* check sentinel */
> +	strbuf_addbuf(&sentinel_path, &path);
> +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> +	if (!stat(sentinel_path.buf, &statbuf)) {
> +		ret = 1;
> +		goto cleanup;
> +	}
> +
> +	/* check file count */
> +	dirp = opendir(path.buf);
> +	while (file_count < tr2env_max_files && dirp && readdir(dirp))
> +		file_count++;
> +	if (dirp)
> +		closedir(dirp);

So, until we accumulate too many files in the directory, every
process when it starts tracing will scan the output directory.
Hopefully the max is not set to too large a value.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] trace2: don't overload target directories
  2019-10-03 23:32   ` [PATCH v4 3/4] trace2: don't overload target directories Josh Steadmon
  2019-10-04  0:25     ` Junio C Hamano
@ 2019-10-04  9:12     ` Johannes Schindelin
  2019-10-04 22:05       ` Josh Steadmon
  1 sibling, 1 reply; 40+ messages in thread
From: Johannes Schindelin @ 2019-10-04  9:12 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, git, stolee

Hi Josh,

On Thu, 3 Oct 2019, Josh Steadmon wrote:

> [...]
> diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> index 5dda0ca1cd..af3405f179 100644
> --- a/trace2/tr2_dst.c
> +++ b/trace2/tr2_dst.c
> @@ -1,3 +1,5 @@
> +#include <dirent.h>
> +

This completely breaks the Windows build:

	In file included from trace2/tr2_dst.c:1:
	compat/win32/dirent.h:13:14: error: 'MAX_PATH' undeclared here (not in a
	function)
	   13 |  char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
	      |              ^~~~~~~~

See the full build log in all its glory here:

https://dev.azure.com/gitgitgadget/git/_build/results?buildId=17409&view=logs&jobId=fd490c07-0b22-5182-fac9-6d67fe1e939b&taskId=ce91d5d6-0c55-50f5-8ab9-6695c03ab102&lineStart=252&lineEnd=255&colStart=1&colEnd=30

The fix is as easy as probably surprising: simply delete that
`#include`. It is redundant anyway:
https://github.com/git/git/blob/v2.23.0/git-compat-util.h#L184

Deleting that `#include` line from your patch not only lets the file
compile cleanly on Windows, it also conforms to our coding style where
`cache.h` or `git-compat-util.h` need to be the first `#include`. That
rule's purpose is precisely to prevent platform-specific compile errors
like the one shown above.

Ciao,
Johannes

>  #include "cache.h"
>  #include "trace2/tr2_dst.h"
>  #include "trace2/tr2_sid.h"
> @@ -8,6 +10,19 @@
>   */
>  #define MAX_AUTO_ATTEMPTS 10
>
> +/*
> + * Sentinel file used to detect when we're overloading a directory with too many
> + * trace files.
> + */
> +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
> +
> +/*
> + * When set to zero, disables directory overload checking. Otherwise, controls
> + * how many files we can write to a directory before entering overload mode.
> + * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
> + */
> +static int tr2env_max_files = 0;
> +
>  static int tr2_dst_want_warning(void)
>  {
>  	static int tr2env_dst_debug = -1;
> @@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
>  	dst->need_close = 0;
>  }
>
> +/*
> + * Check to make sure we're not overloading the target directory with too many
> + * files. First get the threshold (if present) from the config or envvar. If
> + * it's zero or unset, disable this check.  Next check for the presence of a
> + * sentinel file, then check file count. If we are overloaded, create the
> + * sentinel file if it doesn't already exist.
> + *
> + * We expect that some trace processing system is gradually collecting files
> + * from the target directory; after it removes the sentinel file we'll start
> + * writing traces again.
> + */
> +static int tr2_dst_overloaded(const char *tgt_prefix)
> +{
> +	int file_count = 0, max_files = 0, ret = 0;
> +	const char *max_files_var;
> +	DIR *dirp;
> +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> +	struct stat statbuf;
> +
> +	/* Get the config or envvar and decide if we should continue this check */
> +	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
> +	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
> +		tr2env_max_files = max_files;
> +
> +	if (!tr2env_max_files) {
> +		ret = 0;
> +		goto cleanup;
> +	}
> +
> +	strbuf_addstr(&path, tgt_prefix);
> +	if (!is_dir_sep(path.buf[path.len - 1])) {
> +		strbuf_addch(&path, '/');
> +	}
> +
> +	/* check sentinel */
> +	strbuf_addbuf(&sentinel_path, &path);
> +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> +	if (!stat(sentinel_path.buf, &statbuf)) {
> +		ret = 1;
> +		goto cleanup;
> +	}
> +
> +	/* check file count */
> +	dirp = opendir(path.buf);
> +	while (file_count < tr2env_max_files && dirp && readdir(dirp))
> +		file_count++;
> +	if (dirp)
> +		closedir(dirp);
> +
> +	if (file_count >= tr2env_max_files) {
> +		creat(sentinel_path.buf, 0666);
> +		ret = 1;
> +		goto cleanup;
> +	}
> +
> +cleanup:
> +	strbuf_release(&path);
> +	strbuf_release(&sentinel_path);
> +	return ret;
> +}
> +
>  static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>  {
>  	int fd;
> @@ -50,6 +126,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>  	strbuf_addstr(&path, sid);
>  	base_path_len = path.len;
>
> +	if (tr2_dst_overloaded(tgt_prefix)) {
> +		strbuf_release(&path);
> +		if (tr2_dst_want_warning())
> +			warning("trace2: not opening %s trace file due to too "
> +				"many files in target directory %s",
> +				tr2_sysenv_display_name(dst->sysenv_var),
> +				tgt_prefix);
> +		return 0;
> +	}
> +
>  	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
>  		if (attempt_count > 0) {
>  			strbuf_setlen(&path, base_path_len);
> diff --git a/trace2/tr2_sysenv.c b/trace2/tr2_sysenv.c
> index 5958cfc424..3c3792eca2 100644
> --- a/trace2/tr2_sysenv.c
> +++ b/trace2/tr2_sysenv.c
> @@ -49,6 +49,9 @@ static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
>  				       "trace2.perftarget" },
>  	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
>  				       "trace2.perfbrief" },
> +
> +	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
> +				       "trace2.maxfiles" },
>  };
>  /* clang-format on */
>
> diff --git a/trace2/tr2_sysenv.h b/trace2/tr2_sysenv.h
> index 8dd82a7a56..d4364a7b85 100644
> --- a/trace2/tr2_sysenv.h
> +++ b/trace2/tr2_sysenv.h
> @@ -24,6 +24,8 @@ enum tr2_sysenv_variable {
>  	TR2_SYSENV_PERF,
>  	TR2_SYSENV_PERF_BRIEF,
>
> +	TR2_SYSENV_MAX_FILES,
> +
>  	TR2_SYSENV_MUST_BE_LAST
>  };
>
> --
> 2.23.0.581.g78d2f28ef7-goog
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] trace2: don't overload target directories
  2019-10-04  0:25     ` Junio C Hamano
@ 2019-10-04 21:57       ` Josh Steadmon
  0 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-04 21:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, git, stolee

On 2019.10.04 09:25, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > trace2 can write files into a target directory. With heavy usage, this
> > directory can fill up with files, causing difficulty for
> > trace-processing systems.
> 
> Sorry for not mentioning this, but "don't overload" is a suboptimal
> keyword for the entire topic and for this particular step for a few
> reasons.  For one, "overload" is an overloaded verb that gives an
> incorrect impression that the problem you are dealing with is that
> the target directory you specify is (mis)used for other purposes,
> which is not the case.  You instead refrain from creating too many
> files.  The other (which is probably more serious) is that it is
> unclear what approach you chose to solve the "directory ends up
> holding too many files".  One could simply discard new traces to do
> so, one could concatenate to existing files to avoid creating new
> files, one could even cycle the directory (i.e. path/to/log may
> become path/to/log.old.1 and path/to/log is recreated as an empty
> directory when it gets a new file).
> 
>     trace2: discard new traces when a target directory has too many files
> 
> or something would convey the problem you are solving (i.e. "too
> many files" implying negative performance and usability impact
> coming from it) and solution (i.e. "discard new traces"), if it is
> the approach you have chosen.

Understood. Reworded in V5, which will be out shortly.

> > +	/* check sentinel */
> > +	strbuf_addbuf(&sentinel_path, &path);
> > +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> > +	if (!stat(sentinel_path.buf, &statbuf)) {
> > +		ret = 1;
> > +		goto cleanup;
> > +	}
> > +
> > +	/* check file count */
> > +	dirp = opendir(path.buf);
> > +	while (file_count < tr2env_max_files && dirp && readdir(dirp))
> > +		file_count++;
> > +	if (dirp)
> > +		closedir(dirp);
> 
> So, until we accumulate too many files in the directory, every
> process when it starts tracing will scan the output directory.
> Hopefully the max is not set to too large a value.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] trace2: don't overload target directories
  2019-10-04  9:12     ` Johannes Schindelin
@ 2019-10-04 22:05       ` Josh Steadmon
  0 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-04 22:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, git, stolee

On 2019.10.04 11:12, Johannes Schindelin wrote:
> Hi Josh,
> 
> On Thu, 3 Oct 2019, Josh Steadmon wrote:
> 
> > [...]
> > diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> > index 5dda0ca1cd..af3405f179 100644
> > --- a/trace2/tr2_dst.c
> > +++ b/trace2/tr2_dst.c
> > @@ -1,3 +1,5 @@
> > +#include <dirent.h>
> > +
> 
> This completely breaks the Windows build:
> 
> 	In file included from trace2/tr2_dst.c:1:
> 	compat/win32/dirent.h:13:14: error: 'MAX_PATH' undeclared here (not in a
> 	function)
> 	   13 |  char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> 	      |              ^~~~~~~~
> 
> See the full build log in all its glory here:
> 
> https://dev.azure.com/gitgitgadget/git/_build/results?buildId=17409&view=logs&jobId=fd490c07-0b22-5182-fac9-6d67fe1e939b&taskId=ce91d5d6-0c55-50f5-8ab9-6695c03ab102&lineStart=252&lineEnd=255&colStart=1&colEnd=30
> 
> The fix is as easy as probably surprising: simply delete that
> `#include`. It is redundant anyway:
> https://github.com/git/git/blob/v2.23.0/git-compat-util.h#L184

Sorry about that. Fixed in V5, which will be out shortly. Is there a way
to trigger the Azure pipeline apart from using GitGitGadget?

> Deleting that `#include` line from your patch not only lets the file
> compile cleanly on Windows, it also conforms to our coding style where
> `cache.h` or `git-compat-util.h` need to be the first `#include`. That
> rule's purpose is precisely to prevent platform-specific compile errors
> like the one shown above.

Hmm, I wonder if Coccinelle could catch more CodingGuideline mistakes
such as this? Although it seems there are a few exceptions to this rule,
so maybe it's not a good fit in this particular case.

> Ciao,
> Johannes
> 
> >  #include "cache.h"
> >  #include "trace2/tr2_dst.h"
> >  #include "trace2/tr2_sid.h"
> > @@ -8,6 +10,19 @@
> >   */
> >  #define MAX_AUTO_ATTEMPTS 10
> >
> > +/*
> > + * Sentinel file used to detect when we're overloading a directory with too many
> > + * trace files.
> > + */
> > +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
> > +
> > +/*
> > + * When set to zero, disables directory overload checking. Otherwise, controls
> > + * how many files we can write to a directory before entering overload mode.
> > + * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
> > + */
> > +static int tr2env_max_files = 0;
> > +
> >  static int tr2_dst_want_warning(void)
> >  {
> >  	static int tr2env_dst_debug = -1;
> > @@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
> >  	dst->need_close = 0;
> >  }
> >
> > +/*
> > + * Check to make sure we're not overloading the target directory with too many
> > + * files. First get the threshold (if present) from the config or envvar. If
> > + * it's zero or unset, disable this check.  Next check for the presence of a
> > + * sentinel file, then check file count. If we are overloaded, create the
> > + * sentinel file if it doesn't already exist.
> > + *
> > + * We expect that some trace processing system is gradually collecting files
> > + * from the target directory; after it removes the sentinel file we'll start
> > + * writing traces again.
> > + */
> > +static int tr2_dst_overloaded(const char *tgt_prefix)
> > +{
> > +	int file_count = 0, max_files = 0, ret = 0;
> > +	const char *max_files_var;
> > +	DIR *dirp;
> > +	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> > +	struct stat statbuf;
> > +
> > +	/* Get the config or envvar and decide if we should continue this check */
> > +	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
> > +	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
> > +		tr2env_max_files = max_files;
> > +
> > +	if (!tr2env_max_files) {
> > +		ret = 0;
> > +		goto cleanup;
> > +	}
> > +
> > +	strbuf_addstr(&path, tgt_prefix);
> > +	if (!is_dir_sep(path.buf[path.len - 1])) {
> > +		strbuf_addch(&path, '/');
> > +	}
> > +
> > +	/* check sentinel */
> > +	strbuf_addbuf(&sentinel_path, &path);
> > +	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> > +	if (!stat(sentinel_path.buf, &statbuf)) {
> > +		ret = 1;
> > +		goto cleanup;
> > +	}
> > +
> > +	/* check file count */
> > +	dirp = opendir(path.buf);
> > +	while (file_count < tr2env_max_files && dirp && readdir(dirp))
> > +		file_count++;
> > +	if (dirp)
> > +		closedir(dirp);
> > +
> > +	if (file_count >= tr2env_max_files) {
> > +		creat(sentinel_path.buf, 0666);
> > +		ret = 1;
> > +		goto cleanup;
> > +	}
> > +
> > +cleanup:
> > +	strbuf_release(&path);
> > +	strbuf_release(&sentinel_path);
> > +	return ret;
> > +}
> > +
> >  static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
> >  {
> >  	int fd;
> > @@ -50,6 +126,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
> >  	strbuf_addstr(&path, sid);
> >  	base_path_len = path.len;
> >
> > +	if (tr2_dst_overloaded(tgt_prefix)) {
> > +		strbuf_release(&path);
> > +		if (tr2_dst_want_warning())
> > +			warning("trace2: not opening %s trace file due to too "
> > +				"many files in target directory %s",
> > +				tr2_sysenv_display_name(dst->sysenv_var),
> > +				tgt_prefix);
> > +		return 0;
> > +	}
> > +
> >  	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
> >  		if (attempt_count > 0) {
> >  			strbuf_setlen(&path, base_path_len);
> > diff --git a/trace2/tr2_sysenv.c b/trace2/tr2_sysenv.c
> > index 5958cfc424..3c3792eca2 100644
> > --- a/trace2/tr2_sysenv.c
> > +++ b/trace2/tr2_sysenv.c
> > @@ -49,6 +49,9 @@ static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
> >  				       "trace2.perftarget" },
> >  	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
> >  				       "trace2.perfbrief" },
> > +
> > +	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
> > +				       "trace2.maxfiles" },
> >  };
> >  /* clang-format on */
> >
> > diff --git a/trace2/tr2_sysenv.h b/trace2/tr2_sysenv.h
> > index 8dd82a7a56..d4364a7b85 100644
> > --- a/trace2/tr2_sysenv.h
> > +++ b/trace2/tr2_sysenv.h
> > @@ -24,6 +24,8 @@ enum tr2_sysenv_variable {
> >  	TR2_SYSENV_PERF,
> >  	TR2_SYSENV_PERF_BRIEF,
> >
> > +	TR2_SYSENV_MAX_FILES,
> > +
> >  	TR2_SYSENV_MUST_BE_LAST
> >  };
> >
> > --
> > 2.23.0.581.g78d2f28ef7-goog
> >
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files
  2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
                   ` (5 preceding siblings ...)
  2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
@ 2019-10-04 22:08 ` Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
                     ` (3 more replies)
  6 siblings, 4 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-04 22:08 UTC (permalink / raw)
  To: git; +Cc: git, stolee, gitster, Johannes.Schindelin

Changes since V4:
* Avoid the use of the non-specific term "overload" in code, trace
  output, commit messages, and documentation.
* Remove an unnecessary <dirent.h> include that was breaking the Windows
  build.

Josh Steadmon (4):
  docs: mention trace2 target-dir mode in git-config
  docs: clarify trace2 version invariants
  trace2: discard new traces if target directory has too many files
  trace2: write discard message to sentinel files

 Documentation/config/trace2.txt        |   6 ++
 Documentation/technical/api-trace2.txt |  31 +++++--
 Documentation/trace2-target-values.txt |   4 +-
 t/t0212-trace2-event.sh                |  19 +++++
 trace2/tr2_dst.c                       | 111 ++++++++++++++++++++++---
 trace2/tr2_dst.h                       |   1 +
 trace2/tr2_sysenv.c                    |   3 +
 trace2/tr2_sysenv.h                    |   2 +
 trace2/tr2_tgt_event.c                 |  31 +++++--
 trace2/tr2_tgt_normal.c                |   2 +-
 trace2/tr2_tgt_perf.c                  |   2 +-
 11 files changed, 184 insertions(+), 28 deletions(-)

Range-diff against v4:
1:  98a8440d3f ! 1:  391051b308 trace2: don't overload target directories
    @@ Metadata
     Author: Josh Steadmon <steadmon@google.com>
     
      ## Commit message ##
    -    trace2: don't overload target directories
    +    trace2: discard new traces if target directory has too many files
     
         trace2 can write files into a target directory. With heavy usage, this
         directory can fill up with files, causing difficulty for
    @@ Commit message
         following behavior is enabled when the maxFiles is set to a positive
         integer:
           When trace2 would write a file to a target directory, first check
    -      whether or not the directory is overloaded. A directory is overloaded
    -      if there is a sentinel file declaring an overload, or if the number of
    -      files exceeds trace2.maxFiles. If the latter, create a sentinel file
    -      to speed up later overload checks.
    +      whether or not the traces should be discarded. Traces should be
    +      discarded if:
    +        * there is a sentinel file declaring that there are too many files
    +        * OR, the number of files exceeds trace2.maxFiles.
    +      In the latter case, we create a sentinel file named git-trace2-discard
    +      to speed up future checks.
     
         The assumption is that a separate trace-processing system is dealing
         with the generated traces; once it processes and removes the sentinel
         file, it should be safe to generate new trace files again.
     
    -    The default value for trace2.maxFiles is zero, which disables the
    -    overload check.
    +    The default value for trace2.maxFiles is zero, which disables the file
    +    count check.
     
         The config can also be overridden with a new environment variable:
         GIT_TRACE2_MAX_FILES.
    @@ t/t0212-trace2-event.sh: test_expect_success JSON_PP 'using global config, event
      	test_cmp expect actual
      '
      
    -+test_expect_success "don't overload target directory" '
    ++test_expect_success 'discard traces when there are too many files' '
     +	mkdir trace_target_dir &&
     +	test_when_finished "rm -r trace_target_dir" &&
     +	(
    @@ t/t0212-trace2-event.sh: test_expect_success JSON_PP 'using global config, event
     +		cd .. &&
     +		GIT_TRACE2_EVENT="$(pwd)/trace_target_dir" test-tool trace2 001return 0
     +	) &&
    -+	echo git-trace2-overload >>expected_filenames.txt &&
    ++	echo git-trace2-discard >>expected_filenames.txt &&
     +	ls trace_target_dir >ls_output.txt &&
     +	test_cmp expected_filenames.txt ls_output.txt
     +'
    @@ t/t0212-trace2-event.sh: test_expect_success JSON_PP 'using global config, event
      test_done
     
      ## trace2/tr2_dst.c ##
    -@@
    -+#include <dirent.h>
    -+
    - #include "cache.h"
    - #include "trace2/tr2_dst.h"
    - #include "trace2/tr2_sid.h"
     @@
       */
      #define MAX_AUTO_ATTEMPTS 10
      
     +/*
    -+ * Sentinel file used to detect when we're overloading a directory with too many
    -+ * trace files.
    ++ * Sentinel file used to detect when we should discard new traces to avoid
    ++ * writing too many trace files to a directory.
     + */
    -+#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
    ++#define DISCARD_SENTINEL_NAME "git-trace2-discard"
     +
     +/*
    -+ * When set to zero, disables directory overload checking. Otherwise, controls
    -+ * how many files we can write to a directory before entering overload mode.
    ++ * When set to zero, disables directory file count checks. Otherwise, controls
    ++ * how many files we can write to a directory before entering discard mode.
     + * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
     + */
     +static int tr2env_max_files = 0;
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     + * from the target directory; after it removes the sentinel file we'll start
     + * writing traces again.
     + */
    -+static int tr2_dst_overloaded(const char *tgt_prefix)
    ++static int tr2_dst_too_many_files(const char *tgt_prefix)
     +{
     +	int file_count = 0, max_files = 0, ret = 0;
     +	const char *max_files_var;
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     +
     +	/* check sentinel */
     +	strbuf_addbuf(&sentinel_path, &path);
    -+	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
    ++	strbuf_addstr(&sentinel_path, DISCARD_SENTINEL_NAME);
     +	if (!stat(sentinel_path.buf, &statbuf)) {
     +		ret = 1;
     +		goto cleanup;
    @@ trace2/tr2_dst.c: static int tr2_dst_try_auto_path(struct tr2_dst *dst, const ch
      	strbuf_addstr(&path, sid);
      	base_path_len = path.len;
      
    -+	if (tr2_dst_overloaded(tgt_prefix)) {
    ++	if (tr2_dst_too_many_files(tgt_prefix)) {
     +		strbuf_release(&path);
     +		if (tr2_dst_want_warning())
     +			warning("trace2: not opening %s trace file due to too "
2:  790c5ac95a ! 2:  1c3c7f01c6 trace2: write overload message to sentinel files
    @@ Metadata
     Author: Josh Steadmon <steadmon@google.com>
     
      ## Commit message ##
    -    trace2: write overload message to sentinel files
    +    trace2: write discard message to sentinel files
     
    -    Add a new "overload" event type for trace2 event destinations. When the
    -    trace2 overload feature creates a sentinel file, it will include the
    -    normal trace2 output in the sentinel, along with this new overload
    +    Add a new "discard" event type for trace2 event destinations. When the
    +    trace2 file count check creates a sentinel file, it will include the
    +    normal trace2 output in the sentinel, along with this new discard
         event.
     
         Writing this message into the sentinel file is useful for tracking how
    -    often the overload protection feature is triggered in practice.
    +    often the file count check triggers in practice.
     
         Bump up the event format version since we've added a new event type.
     
    @@ Documentation/technical/api-trace2.txt: only present on the "start" and "atexit"
      }
      ------------
      
    -+`"overload"`::
    -+	This event is created in a sentinel file if we are overloading a target
    -+	trace directory (see the trace2.maxFiles config option).
    ++`"discard"`::
    ++	This event is written to the git-trace2-discard sentinel file if there
    ++	are too many files in the target trace directory (see the
    ++	trace2.maxFiles config option).
     ++
     +------------
     +{
    -+	"event":"overload",
    ++	"event":"discard",
     +	...
     +}
     +------------
    @@ Documentation/technical/api-trace2.txt: only present on the "start" and "atexit"
      +
     
      ## t/t0212-trace2-event.sh ##
    -@@ t/t0212-trace2-event.sh: test_expect_success "don't overload target directory" '
    +@@ t/t0212-trace2-event.sh: test_expect_success 'discard traces when there are too many files' '
      	) &&
    - 	echo git-trace2-overload >>expected_filenames.txt &&
    + 	echo git-trace2-discard >>expected_filenames.txt &&
      	ls trace_target_dir >ls_output.txt &&
     -	test_cmp expected_filenames.txt ls_output.txt
     +	test_cmp expected_filenames.txt ls_output.txt &&
    -+	head -n1 trace_target_dir/git-trace2-overload | grep \"event\":\"version\" &&
    -+	head -n2 trace_target_dir/git-trace2-overload | tail -n1 | grep \"event\":\"overload\"
    ++	head -n1 trace_target_dir/git-trace2-discard | grep \"event\":\"version\" &&
    ++	head -n2 trace_target_dir/git-trace2-discard | tail -n1 | grep \"event\":\"too_many_files\"
      '
      
      test_done
    @@ trace2/tr2_dst.c: void tr2_dst_trace_disable(struct tr2_dst *dst)
     + * sentinel file, then check file count.
     + *
     + * Returns 0 if tracing should proceed as normal. Returns 1 if the sentinel file
    -+ * already exists, which means tracing should be disabled. Returns -1 if we are
    -+ * overloaded but there was no sentinel file, which means we have created and
    -+ * should write traces to the sentinel file.
    ++ * already exists, which means tracing should be disabled. Returns -1 if there
    ++ * are too many files but there was no sentinel file, which means we have
    ++ * created and should write traces to the sentinel file.
       *
       * We expect that some trace processing system is gradually collecting files
       * from the target directory; after it removes the sentinel file we'll start
       * writing traces again.
       */
    --static int tr2_dst_overloaded(const char *tgt_prefix)
    -+static int tr2_dst_overloaded(struct tr2_dst *dst, const char *tgt_prefix)
    +-static int tr2_dst_too_many_files(const char *tgt_prefix)
    ++static int tr2_dst_too_many_files(struct tr2_dst *dst, const char *tgt_prefix)
      {
      	int file_count = 0, max_files = 0, ret = 0;
      	const char *max_files_var;
    -@@ trace2/tr2_dst.c: static int tr2_dst_overloaded(const char *tgt_prefix)
    +@@ trace2/tr2_dst.c: static int tr2_dst_too_many_files(const char *tgt_prefix)
      		closedir(dirp);
      
      	if (file_count >= tr2env_max_files) {
     -		creat(sentinel_path.buf, 0666);
     -		ret = 1;
    -+		dst->overloaded = 1;
    ++		dst->too_many_files = 1;
     +		dst->fd = open(sentinel_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
     +		ret = -1;
      		goto cleanup;
      	}
      
    -@@ trace2/tr2_dst.c: static int tr2_dst_overloaded(const char *tgt_prefix)
    +@@ trace2/tr2_dst.c: static int tr2_dst_too_many_files(const char *tgt_prefix)
      
      static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
      {
     -	int fd;
    -+	int overloaded;
    ++	int too_many_files;
      	const char *last_slash, *sid = tr2_sid_get();
      	struct strbuf path = STRBUF_INIT;
      	size_t base_path_len;
    @@ trace2/tr2_dst.c: static int tr2_dst_try_auto_path(struct tr2_dst *dst, const ch
      	strbuf_addstr(&path, sid);
      	base_path_len = path.len;
      
    --	if (tr2_dst_overloaded(tgt_prefix)) {
    -+	overloaded = tr2_dst_overloaded(dst, tgt_prefix);
    -+	if (!overloaded) {
    +-	if (tr2_dst_too_many_files(tgt_prefix)) {
    ++	too_many_files = tr2_dst_too_many_files(dst, tgt_prefix);
    ++	if (!too_many_files) {
     +		for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
     +			if (attempt_count > 0) {
     +				strbuf_setlen(&path, base_path_len);
    @@ trace2/tr2_dst.c: static int tr2_dst_try_auto_path(struct tr2_dst *dst, const ch
     +			if (dst->fd != -1)
     +				break;
     +		}
    -+	} else if (overloaded == 1) {
    ++	} else if (too_many_files == 1) {
      		strbuf_release(&path);
      		if (tr2_dst_want_warning())
      			warning("trace2: not opening %s trace file due to too "
    @@ trace2/tr2_dst.h: struct tr2_dst {
      	int fd;
      	unsigned int initialized : 1;
      	unsigned int need_close : 1;
    -+	unsigned int overloaded : 1;
    ++	unsigned int too_many_files : 1;
      };
      
      /*
    @@ trace2/tr2_tgt_event.c: static void event_fmt_prepare(const char *event_name, co
      		jw_object_intmax(jw, "repo", repo->trace2_repo_id);
      }
      
    -+static void fn_overload_fl(const char *file, int line)
    ++static void fn_too_many_files_fl(const char *file, int line)
     +{
    -+	const char *event_name = "overload";
    ++	const char *event_name = "too_many_files";
     +	struct json_writer jw = JSON_WRITER_INIT;
     +
     +	jw_object_begin(&jw, 0);
    @@ trace2/tr2_tgt_event.c: static void fn_version_fl(const char *file, int line)
      	tr2_dst_write_line(&tr2dst_event, &jw.json);
      	jw_release(&jw);
     +
    -+	if (tr2dst_event.overloaded)
    -+		fn_overload_fl(file, line);
    ++	if (tr2dst_event.too_many_files)
    ++		fn_too_many_files_fl(file, line);
      }
      
      static void fn_start_fl(const char *file, int line,
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/4] docs: mention trace2 target-dir mode in git-config
  2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
@ 2019-10-04 22:08   ` Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 2/4] docs: clarify trace2 version invariants Josh Steadmon
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-04 22:08 UTC (permalink / raw)
  To: git; +Cc: git, stolee, gitster, Johannes.Schindelin

Move the description of trace2's target-directory behavior into the
shared trace2-target-values file so that it is included in both the
git-config and api-trace2 docs. Leave the SID discussion only in
api-trace2 since it's a technical detail.

Change-Id: I3d052c5904684e981f295d64aa2c5d62cfaa4500
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 7 +++----
 Documentation/trace2-target-values.txt | 4 +++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 71eb081fed..80ffceada0 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -142,10 +142,9 @@ system or global config value to one of the following:
 
 include::../trace2-target-values.txt[]
 
-If the target already exists and is a directory, the traces will be
-written to files (one per process) underneath the given directory. They
-will be named according to the last component of the SID (optionally
-followed by a counter to avoid filename collisions).
+When trace files are written to a target directory, they will be named according
+to the last component of the SID (optionally followed by a counter to avoid
+filename collisions).
 
 == Trace2 API
 
diff --git a/Documentation/trace2-target-values.txt b/Documentation/trace2-target-values.txt
index 27d3c64e66..3985b6d3c2 100644
--- a/Documentation/trace2-target-values.txt
+++ b/Documentation/trace2-target-values.txt
@@ -2,7 +2,9 @@
 * `0` or `false` - Disables the target.
 * `1` or `true` - Writes to `STDERR`.
 * `[2-9]` - Writes to the already opened file descriptor.
-* `<absolute-pathname>` - Writes to the file in append mode.
+* `<absolute-pathname>` - Writes to the file in append mode. If the target
+already exists and is a directory, the traces will be written to files (one
+per process) underneath the given directory.
 * `af_unix:[<socket_type>:]<absolute-pathname>` - Write to a
 Unix DomainSocket (on platforms that support them).  Socket
 type can be either `stream` or `dgram`; if omitted Git will
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/4] docs: clarify trace2 version invariants
  2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
@ 2019-10-04 22:08   ` Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 3/4] trace2: discard new traces if target directory has too many files Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 4/4] trace2: write discard message to sentinel files Josh Steadmon
  3 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-04 22:08 UTC (permalink / raw)
  To: git; +Cc: git, stolee, gitster, Johannes.Schindelin

Make it explicit that we always want trace2 "version" events to be the
first event of any trace session. Also list the changes that would or
would not cause the EVENT format version field to be incremented.

Change-Id: I20b9ac1fa0786bcaad7e290cc805cbf45b323021
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 80ffceada0..18b79128fd 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -604,7 +604,13 @@ only present on the "start" and "atexit" events.
 ==== Event-Specific Key/Value Pairs
 
 `"version"`::
-	This event gives the version of the executable and the EVENT format.
+	This event gives the version of the executable and the EVENT format. It
+	should always be the first event in a trace session. The EVENT format
+	version will be incremented if new event types are added, if existing
+	fields are removed, or if there are significant changes in
+	interpretation of existing events or fields. Smaller changes, such as
+	adding a new field to an existing event, will not require an increment
+	to the EVENT format version.
 +
 ------------
 {
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 3/4] trace2: discard new traces if target directory has too many files
  2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 2/4] docs: clarify trace2 version invariants Josh Steadmon
@ 2019-10-04 22:08   ` Josh Steadmon
  2019-10-04 22:08   ` [PATCH v5 4/4] trace2: write discard message to sentinel files Josh Steadmon
  3 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-04 22:08 UTC (permalink / raw)
  To: git; +Cc: git, stolee, gitster, Johannes.Schindelin

trace2 can write files into a target directory. With heavy usage, this
directory can fill up with files, causing difficulty for
trace-processing systems.

This patch adds a config option (trace2.maxFiles) to set a maximum
number of files that trace2 will write to a target directory. The
following behavior is enabled when the maxFiles is set to a positive
integer:
  When trace2 would write a file to a target directory, first check
  whether or not the traces should be discarded. Traces should be
  discarded if:
    * there is a sentinel file declaring that there are too many files
    * OR, the number of files exceeds trace2.maxFiles.
  In the latter case, we create a sentinel file named git-trace2-discard
  to speed up future checks.

The assumption is that a separate trace-processing system is dealing
with the generated traces; once it processes and removes the sentinel
file, it should be safe to generate new trace files again.

The default value for trace2.maxFiles is zero, which disables the file
count check.

The config can also be overridden with a new environment variable:
GIT_TRACE2_MAX_FILES.

Change-Id: I8fc613e88dd99ffcdeea9d6c1f232c3670096a9e
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/config/trace2.txt |  6 +++
 t/t0212-trace2-event.sh         | 17 +++++++
 trace2/tr2_dst.c                | 84 +++++++++++++++++++++++++++++++++
 trace2/tr2_sysenv.c             |  3 ++
 trace2/tr2_sysenv.h             |  2 +
 5 files changed, 112 insertions(+)

diff --git a/Documentation/config/trace2.txt b/Documentation/config/trace2.txt
index 2edbfb02fe..4ce0b9a6d1 100644
--- a/Documentation/config/trace2.txt
+++ b/Documentation/config/trace2.txt
@@ -54,3 +54,9 @@ trace2.destinationDebug::
 	By default, these errors are suppressed and tracing is
 	silently disabled.  May be overridden by the
 	`GIT_TRACE2_DST_DEBUG` environment variable.
+
+trace2.maxFiles::
+	Integer.  When writing trace files to a target directory, do not
+	write additional traces if we would exceed this many files. Instead,
+	write a sentinel file that will block further tracing to this
+	directory. Defaults to 0, which disables this check.
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index ff5b9cc729..bf75e06e30 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -265,4 +265,21 @@ test_expect_success JSON_PP 'using global config, event stream, error event' '
 	test_cmp expect actual
 '
 
+test_expect_success 'discard traces when there are too many files' '
+	mkdir trace_target_dir &&
+	test_when_finished "rm -r trace_target_dir" &&
+	(
+		GIT_TRACE2_MAX_FILES=5 &&
+		export GIT_TRACE2_MAX_FILES &&
+		cd trace_target_dir &&
+		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
+		xargs touch <../expected_filenames.txt &&
+		cd .. &&
+		GIT_TRACE2_EVENT="$(pwd)/trace_target_dir" test-tool trace2 001return 0
+	) &&
+	echo git-trace2-discard >>expected_filenames.txt &&
+	ls trace_target_dir >ls_output.txt &&
+	test_cmp expected_filenames.txt ls_output.txt
+'
+
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 5dda0ca1cd..f4646bf98d 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -8,6 +8,19 @@
  */
 #define MAX_AUTO_ATTEMPTS 10
 
+/*
+ * Sentinel file used to detect when we should discard new traces to avoid
+ * writing too many trace files to a directory.
+ */
+#define DISCARD_SENTINEL_NAME "git-trace2-discard"
+
+/*
+ * When set to zero, disables directory file count checks. Otherwise, controls
+ * how many files we can write to a directory before entering discard mode.
+ * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
+ */
+static int tr2env_max_files = 0;
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -32,6 +45,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+/*
+ * Check to make sure we're not overloading the target directory with too many
+ * files. First get the threshold (if present) from the config or envvar. If
+ * it's zero or unset, disable this check.  Next check for the presence of a
+ * sentinel file, then check file count. If we are overloaded, create the
+ * sentinel file if it doesn't already exist.
+ *
+ * We expect that some trace processing system is gradually collecting files
+ * from the target directory; after it removes the sentinel file we'll start
+ * writing traces again.
+ */
+static int tr2_dst_too_many_files(const char *tgt_prefix)
+{
+	int file_count = 0, max_files = 0, ret = 0;
+	const char *max_files_var;
+	DIR *dirp;
+	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
+	struct stat statbuf;
+
+	/* Get the config or envvar and decide if we should continue this check */
+	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
+	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
+		tr2env_max_files = max_files;
+
+	if (!tr2env_max_files) {
+		ret = 0;
+		goto cleanup;
+	}
+
+	strbuf_addstr(&path, tgt_prefix);
+	if (!is_dir_sep(path.buf[path.len - 1])) {
+		strbuf_addch(&path, '/');
+	}
+
+	/* check sentinel */
+	strbuf_addbuf(&sentinel_path, &path);
+	strbuf_addstr(&sentinel_path, DISCARD_SENTINEL_NAME);
+	if (!stat(sentinel_path.buf, &statbuf)) {
+		ret = 1;
+		goto cleanup;
+	}
+
+	/* check file count */
+	dirp = opendir(path.buf);
+	while (file_count < tr2env_max_files && dirp && readdir(dirp))
+		file_count++;
+	if (dirp)
+		closedir(dirp);
+
+	if (file_count >= tr2env_max_files) {
+		creat(sentinel_path.buf, 0666);
+		ret = 1;
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&path);
+	strbuf_release(&sentinel_path);
+	return ret;
+}
+
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int fd;
@@ -50,6 +124,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
+	if (tr2_dst_too_many_files(tgt_prefix)) {
+		strbuf_release(&path);
+		if (tr2_dst_want_warning())
+			warning("trace2: not opening %s trace file due to too "
+				"many files in target directory %s",
+				tr2_sysenv_display_name(dst->sysenv_var),
+				tgt_prefix);
+		return 0;
+	}
+
 	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
 		if (attempt_count > 0) {
 			strbuf_setlen(&path, base_path_len);
diff --git a/trace2/tr2_sysenv.c b/trace2/tr2_sysenv.c
index 5958cfc424..3c3792eca2 100644
--- a/trace2/tr2_sysenv.c
+++ b/trace2/tr2_sysenv.c
@@ -49,6 +49,9 @@ static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
 				       "trace2.perftarget" },
 	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
 				       "trace2.perfbrief" },
+
+	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
+				       "trace2.maxfiles" },
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_sysenv.h b/trace2/tr2_sysenv.h
index 8dd82a7a56..d4364a7b85 100644
--- a/trace2/tr2_sysenv.h
+++ b/trace2/tr2_sysenv.h
@@ -24,6 +24,8 @@ enum tr2_sysenv_variable {
 	TR2_SYSENV_PERF,
 	TR2_SYSENV_PERF_BRIEF,
 
+	TR2_SYSENV_MAX_FILES,
+
 	TR2_SYSENV_MUST_BE_LAST
 };
 
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 4/4] trace2: write discard message to sentinel files
  2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
                     ` (2 preceding siblings ...)
  2019-10-04 22:08   ` [PATCH v5 3/4] trace2: discard new traces if target directory has too many files Josh Steadmon
@ 2019-10-04 22:08   ` Josh Steadmon
  3 siblings, 0 replies; 40+ messages in thread
From: Josh Steadmon @ 2019-10-04 22:08 UTC (permalink / raw)
  To: git; +Cc: git, stolee, gitster, Johannes.Schindelin

Add a new "discard" event type for trace2 event destinations. When the
trace2 file count check creates a sentinel file, it will include the
normal trace2 output in the sentinel, along with this new discard
event.

Writing this message into the sentinel file is useful for tracking how
often the file count check triggers in practice.

Bump up the event format version since we've added a new event type.

Change-Id: I0953c1a891ad93585ddcd8e22575ec34c0ebe73c
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 16 +++++++--
 t/t0212-trace2-event.sh                |  4 ++-
 trace2/tr2_dst.c                       | 47 ++++++++++++++------------
 trace2/tr2_dst.h                       |  1 +
 trace2/tr2_tgt_event.c                 | 31 +++++++++++++----
 trace2/tr2_tgt_normal.c                |  2 +-
 trace2/tr2_tgt_perf.c                  |  2 +-
 7 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 18b79128fd..a045dbe422 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -128,7 +128,7 @@ yields
 
 ------------
 $ cat ~/log.event
-{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"}
+{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"2","exe":"2.20.1.155.g426c96fcdb"}
 {"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]}
 {"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"}
 {"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0}
@@ -616,11 +616,23 @@ only present on the "start" and "atexit" events.
 {
 	"event":"version",
 	...
-	"evt":"1",		       # EVENT format version
+	"evt":"2",		       # EVENT format version
 	"exe":"2.20.1.155.g426c96fcdb" # git version
 }
 ------------
 
+`"discard"`::
+	This event is written to the git-trace2-discard sentinel file if there
+	are too many files in the target trace directory (see the
+	trace2.maxFiles config option).
++
+------------
+{
+	"event":"discard",
+	...
+}
+------------
+
 `"start"`::
 	This event contains the complete argv received by main().
 +
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index bf75e06e30..7065a1b937 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -279,7 +279,9 @@ test_expect_success 'discard traces when there are too many files' '
 	) &&
 	echo git-trace2-discard >>expected_filenames.txt &&
 	ls trace_target_dir >ls_output.txt &&
-	test_cmp expected_filenames.txt ls_output.txt
+	test_cmp expected_filenames.txt ls_output.txt &&
+	head -n1 trace_target_dir/git-trace2-discard | grep \"event\":\"version\" &&
+	head -n2 trace_target_dir/git-trace2-discard | tail -n1 | grep \"event\":\"too_many_files\"
 '
 
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index f4646bf98d..741ec872a7 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -48,15 +48,19 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 /*
  * Check to make sure we're not overloading the target directory with too many
  * files. First get the threshold (if present) from the config or envvar. If
- * it's zero or unset, disable this check.  Next check for the presence of a
- * sentinel file, then check file count. If we are overloaded, create the
- * sentinel file if it doesn't already exist.
+ * it's zero or unset, disable this check. Next check for the presence of a
+ * sentinel file, then check file count.
+ *
+ * Returns 0 if tracing should proceed as normal. Returns 1 if the sentinel file
+ * already exists, which means tracing should be disabled. Returns -1 if there
+ * are too many files but there was no sentinel file, which means we have
+ * created and should write traces to the sentinel file.
  *
  * We expect that some trace processing system is gradually collecting files
  * from the target directory; after it removes the sentinel file we'll start
  * writing traces again.
  */
-static int tr2_dst_too_many_files(const char *tgt_prefix)
+static int tr2_dst_too_many_files(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int file_count = 0, max_files = 0, ret = 0;
 	const char *max_files_var;
@@ -95,8 +99,9 @@ static int tr2_dst_too_many_files(const char *tgt_prefix)
 		closedir(dirp);
 
 	if (file_count >= tr2env_max_files) {
-		creat(sentinel_path.buf, 0666);
-		ret = 1;
+		dst->too_many_files = 1;
+		dst->fd = open(sentinel_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
+		ret = -1;
 		goto cleanup;
 	}
 
@@ -108,7 +113,7 @@ static int tr2_dst_too_many_files(const char *tgt_prefix)
 
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
-	int fd;
+	int too_many_files;
 	const char *last_slash, *sid = tr2_sid_get();
 	struct strbuf path = STRBUF_INIT;
 	size_t base_path_len;
@@ -124,7 +129,19 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
-	if (tr2_dst_too_many_files(tgt_prefix)) {
+	too_many_files = tr2_dst_too_many_files(dst, tgt_prefix);
+	if (!too_many_files) {
+		for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
+			if (attempt_count > 0) {
+				strbuf_setlen(&path, base_path_len);
+				strbuf_addf(&path, ".%d", attempt_count);
+			}
+
+			dst->fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
+			if (dst->fd != -1)
+				break;
+		}
+	} else if (too_many_files == 1) {
 		strbuf_release(&path);
 		if (tr2_dst_want_warning())
 			warning("trace2: not opening %s trace file due to too "
@@ -134,18 +151,7 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 		return 0;
 	}
 
-	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
-		if (attempt_count > 0) {
-			strbuf_setlen(&path, base_path_len);
-			strbuf_addf(&path, ".%d", attempt_count);
-		}
-
-		fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
-		if (fd != -1)
-			break;
-	}
-
-	if (fd == -1) {
+	if (dst->fd == -1) {
 		if (tr2_dst_want_warning())
 			warning("trace2: could not open '%.*s' for '%s' tracing: %s",
 				(int) base_path_len, path.buf,
@@ -159,7 +165,6 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 
 	strbuf_release(&path);
 
-	dst->fd = fd;
 	dst->need_close = 1;
 	dst->initialized = 1;
 
diff --git a/trace2/tr2_dst.h b/trace2/tr2_dst.h
index 3adf3bac13..b1a8c144e0 100644
--- a/trace2/tr2_dst.h
+++ b/trace2/tr2_dst.h
@@ -9,6 +9,7 @@ struct tr2_dst {
 	int fd;
 	unsigned int initialized : 1;
 	unsigned int need_close : 1;
+	unsigned int too_many_files : 1;
 };
 
 /*
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index c2852d1bd2..424e6def3e 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -10,16 +10,17 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0 };
+static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
 
 /*
- * The version number of the JSON data generated by the EVENT target
- * in this source file.  Update this if you make a significant change
- * to the JSON fields or message structure.  You probably do not need
- * to update this if you just add another call to one of the existing
- * TRACE2 API methods.
+ * The version number of the JSON data generated by the EVENT target in this
+ * source file. The version should be incremented if new event types are added,
+ * if existing fields are removed, or if there are significant changes in
+ * interpretation of existing events or fields. Smaller changes, such as adding
+ * a new field to an existing event, do not require an increment to the EVENT
+ * format version.
  */
-#define TR2_EVENT_VERSION "1"
+#define TR2_EVENT_VERSION "2"
 
 /*
  * Region nesting limit for messages written to the event target.
@@ -107,6 +108,19 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 		jw_object_intmax(jw, "repo", repo->trace2_repo_id);
 }
 
+static void fn_too_many_files_fl(const char *file, int line)
+{
+	const char *event_name = "too_many_files";
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 static void fn_version_fl(const char *file, int line)
 {
 	const char *event_name = "version";
@@ -120,6 +134,9 @@ static void fn_version_fl(const char *file, int line)
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
 	jw_release(&jw);
+
+	if (tr2dst_event.too_many_files)
+		fn_too_many_files_fl(file, line);
 }
 
 static void fn_start_fl(const char *file, int line,
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 00b116d797..e91633468d 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -9,7 +9,7 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_normal = { TR2_SYSENV_NORMAL, 0, 0, 0 };
+static struct tr2_dst tr2dst_normal = { TR2_SYSENV_NORMAL, 0, 0, 0, 0 };
 
 /*
  * Use the TR2_SYSENV_NORMAL_BRIEF setting to omit the "<time> <file>:<line>"
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index ea0cbbe13e..bebac4782d 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -11,7 +11,7 @@
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
 
-static struct tr2_dst tr2dst_perf = { TR2_SYSENV_PERF, 0, 0, 0 };
+static struct tr2_dst tr2dst_perf = { TR2_SYSENV_PERF, 0, 0, 0, 0 };
 
 /*
  * Use TR2_SYSENV_PERF_BRIEF to omit the "<time> <file>:<line>"
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-10-04 22:08 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
2019-07-30 13:29 ` Derrick Stolee
2019-07-30 21:52   ` Josh Steadmon
2019-07-30 16:46 ` Jeff Hostetler
2019-07-30 22:01   ` Josh Steadmon
2019-07-30 22:02   ` Josh Steadmon
2019-07-30 18:00 ` Jeff Hostetler
2019-07-30 22:08   ` Josh Steadmon
2019-08-02 22:02 ` [RFC PATCH v2 0/2] " Josh Steadmon
2019-08-02 22:02   ` [RFC PATCH v2 1/2] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-08-02 22:02   ` [RFC PATCH v2 2/2] trace2: don't overload target directories Josh Steadmon
2019-08-05 15:34     ` Jeff Hostetler
2019-08-05 18:17       ` Josh Steadmon
2019-08-05 18:01     ` SZEDER Gábor
2019-08-05 18:09       ` Josh Steadmon
2019-09-14  0:25 ` [RFC PATCH v3 0/3] " Josh Steadmon
2019-09-14  0:25   ` [RFC PATCH v3 1/3] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-09-14  0:25   ` [RFC PATCH v3 2/3] trace2: don't overload target directories Josh Steadmon
2019-09-14  0:26   ` [RFC PATCH v3 3/3] trace2: write overload message to sentinel files Josh Steadmon
2019-09-16 12:07     ` Derrick Stolee
2019-09-16 14:11       ` Jeff Hostetler
2019-09-16 18:20         ` Josh Steadmon
2019-09-19 18:23           ` Jeff Hostetler
2019-09-19 22:47             ` Josh Steadmon
2019-09-20 15:59               ` Jeff Hostetler
2019-09-16 18:07       ` Josh Steadmon
2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 2/4] docs: clarify trace2 version invariants Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 3/4] trace2: don't overload target directories Josh Steadmon
2019-10-04  0:25     ` Junio C Hamano
2019-10-04 21:57       ` Josh Steadmon
2019-10-04  9:12     ` Johannes Schindelin
2019-10-04 22:05       ` Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 4/4] trace2: write overload message to sentinel files Josh Steadmon
2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 2/4] docs: clarify trace2 version invariants Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 3/4] trace2: discard new traces if target directory has too many files Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 4/4] trace2: write discard message to sentinel files Josh Steadmon

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).