git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Randomize / timestamp trace2 targets
@ 2019-03-13 23:33 Josh Steadmon
  2019-03-13 23:33 ` [PATCH 1/2] date: make get_time() public Josh Steadmon
                   ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Josh Steadmon @ 2019-03-13 23:33 UTC (permalink / raw)
  To: git, git

Persistently enabling trace2 output is difficult because it requires
specifying a full filename. This series teaches tr2_dst_get_trace_fd()
to randomize filenames when a directory or filename prefix are given as
targets in the GIT_TR2_* envvars. It also allows expansion of a
timestamp template string into the current UTC timestamp.

Josh Steadmon (2):
  date: make get_time() public
  trace2: randomize/timestamp trace2 targets

 Documentation/technical/api-trace2.txt | 10 +++
 cache.h                                |  1 +
 date.c                                 |  2 +-
 t/t0210-trace2-normal.sh               | 93 ++++++++++++++++++++++++++
 trace2/tr2_dst.c                       | 86 +++++++++++++++++++++++-
 5 files changed, 189 insertions(+), 3 deletions(-)

-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/2] date: make get_time() public
  2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
@ 2019-03-13 23:33 ` Josh Steadmon
  2019-03-13 23:33 ` [PATCH 2/2] trace2: randomize/timestamp trace2 targets Josh Steadmon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: Josh Steadmon @ 2019-03-13 23:33 UTC (permalink / raw)
  To: git, git

get_time() is the standard way to get the current time while also
respecting frozen timestamps for tests. Expose this for use in other
files.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 cache.h | 1 +
 date.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index abd518a9a2..0a2c436d0a 100644
--- a/cache.h
+++ b/cache.h
@@ -1489,6 +1489,7 @@ struct date_mode {
 #define DATE_MODE(t) date_mode_from_type(DATE_##t)
 struct date_mode *date_mode_from_type(enum date_mode_type type);
 
+void get_time(struct timeval *now);
 const char *show_date(timestamp_t time, int timezone, const struct date_mode *mode);
 void show_date_relative(timestamp_t time, const struct timeval *now,
 			struct strbuf *timebuf);
diff --git a/date.c b/date.c
index 8126146c50..33576d2deb 100644
--- a/date.c
+++ b/date.c
@@ -115,7 +115,7 @@ static int local_tzoffset(timestamp_t time)
 	return local_time_tzoffset((time_t)time, &tm);
 }
 
-static void get_time(struct timeval *now)
+void get_time(struct timeval *now)
 {
 	const char *x;
 
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
  2019-03-13 23:33 ` [PATCH 1/2] date: make get_time() public Josh Steadmon
@ 2019-03-13 23:33 ` Josh Steadmon
  2019-03-13 23:49   ` Ævar Arnfjörð Bjarmason
  2019-03-14  0:16   ` Jeff King
  2019-03-14 14:34 ` [PATCH 0/2] Randomize / timestamp " Johannes Schindelin
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 34+ messages in thread
From: Josh Steadmon @ 2019-03-13 23:33 UTC (permalink / raw)
  To: git, git

When the value of a trace2 environment variable contains instances of
the string "%ISO8601%", expand them into the current UTC timestamp in
ISO 8601 format.

When the value of a trace2 environment variable is an absolute path
referring to an existing directory, write output to randomly-named
files under the given directory. If the value is an absolute path
referring to a non-existent file and ends with a dash, use the value as
a prefix for randomly named files.

The random filenames will consist of the value of the environment
variable (after potential timestamp expansion), followed by a 6
character random string such as would be produced by mkstemp(3).

This makes it more convenient to collect traces for every git
invocation by unconditionally setting the relevant trace2 envvar to a
constant directory name.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt | 10 +++
 t/t0210-trace2-normal.sh               | 93 ++++++++++++++++++++++++++
 trace2/tr2_dst.c                       | 86 +++++++++++++++++++++++-
 3 files changed, 187 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 2de565fa3d..1362bf7d0b 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -109,6 +109,16 @@ values are recognized.
 
 	Enables the target, opens and writes to the file in append mode.
 
+	If the path includes any instances of the string "%ISO8601%", they will
+	be replaced with the current UTC timestamp in ISO 8601 format with
+	dashes and colons removed, e.g., "20190315T143059Z".
+
+	If (after potential timestamp expansion) the path already exists and is
+	a directory, the traces will write to randomly-named files (one per
+	process) under the given directory. If the pathname does not already
+	exist and ends with a dash, it will be used as a prefix for
+	randomly-named files (one per process).
+
 `af_unix:[<socket_type>:]<absolute-pathname>`::
 
 	Enables the target, opens and writes to a Unix Domain Socket
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index 03a0aedb1d..1b992c3e61 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -80,6 +80,99 @@ test_expect_success 'normal stream, return code 1' '
 	test_cmp expect actual
 '
 
+test_expect_success 'randomized filename' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir traces &&
+	GIT_TR2="$(pwd)/traces" test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <"$(ls traces/??????)" >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'randomized filename with prefix' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir traces &&
+	GIT_TR2="$(pwd)/traces/trace-" test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <"$(ls traces/trace-??????)" >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'timestamped filename' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir traces &&
+	GIT_TEST_DATE_NOW=1552658399 GIT_TR2="$(pwd)/traces/trace.%ISO8601%" \
+		test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <traces/trace.20190315T135959Z >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'multiple timestamps' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir -p traces/20190315T135959Z &&
+	GIT_TEST_DATE_NOW=1552658399 GIT_TR2="$(pwd)/traces/%ISO8601%/trace.%ISO8601%" \
+		test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <traces/20190315T135959Z/trace.20190315T135959Z >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'timestamp plus randomization' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir traces &&
+	GIT_TEST_DATE_NOW=1552658399 GIT_TR2="$(pwd)/traces/trace-%ISO8601%-" test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <"$(ls traces/trace-20190315T135959Z-??????)" >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+test_expect_success 'no randomization if target exists' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir traces &&
+	touch traces/trace- &&
+	GIT_TR2="$(pwd)/traces/trace-" test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <traces/trace- >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
+
 # Verb 002exit
 #
 # Explicit exit(code) from within cmd_<verb> propagates <code>.
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index fd490a43ad..27405a92b4 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -12,6 +12,11 @@
  */
 #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"
 
+/* Constant string used for timestamp replacement in output destination
+ * filenames. See tr2_replace_timestamp_templates() below.
+ */
+static const char *iso_timestamp_tmpl = "%ISO8601%";
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -55,6 +60,64 @@ static int tr2_dst_try_path(struct tr2_dst *dst, const char *tgt_value)
 	return dst->fd;
 }
 
+static void tr2_replace_timestamp_templates(const char *path,
+					    struct strbuf *modified_path)
+{
+	char *iso_ptr;
+
+	strbuf_addstr(modified_path, path);
+
+	iso_ptr = strstr(modified_path->buf, iso_timestamp_tmpl);
+	if (iso_ptr) {
+		struct timeval tv;
+		struct tm tm;
+		size_t iso_len = strlen(iso_timestamp_tmpl);
+		struct strbuf timestamp = STRBUF_INIT;
+
+		get_time(&tv);
+		gmtime_r(&tv.tv_sec, &tm);
+		strbuf_addftime(&timestamp, "%Y%m%dT%H%M%SZ", &tm, 0, 0);
+
+		while (iso_ptr) {
+			strbuf_splice(modified_path,
+				      iso_ptr - modified_path->buf, iso_len,
+				      timestamp.buf, timestamp.len);
+
+			iso_ptr = strstr(modified_path->buf,
+					 iso_timestamp_tmpl);
+		}
+
+		strbuf_release(&timestamp);
+	}
+}
+
+static int tr2_dst_try_random_path(struct tr2_dst *dst, struct strbuf *path)
+{
+	int fd;
+	char last_path_char;
+
+	last_path_char = path->buf[path->len - 1];
+	if (!is_dir_sep(last_path_char) && last_path_char != '-')
+		strbuf_addch(path, '/');
+
+	strbuf_addstr(path, "XXXXXX");
+
+	fd = mkstemp(path->buf);
+	if (fd == -1) {
+		if (tr2_dst_want_warning())
+			warning("trace2: could not open '%s' for '%s' tracing: %s",
+				path->buf, dst->env_var_name, strerror(errno));
+		tr2_dst_trace_disable(dst);
+		return 0;
+	}
+
+	dst->fd = fd;
+	dst->need_close = 1;
+	dst->initialized = 1;
+
+	return dst->fd;
+}
+
 #ifndef NO_UNIX_SOCKETS
 #define PREFIX_AF_UNIX "af_unix:"
 #define PREFIX_AF_UNIX_STREAM "af_unix:stream:"
@@ -177,6 +240,7 @@ static void tr2_dst_malformed_warning(struct tr2_dst *dst,
 int tr2_dst_get_trace_fd(struct tr2_dst *dst)
 {
 	const char *tgt_value;
+	struct stat st;
 
 	/* don't open twice */
 	if (dst->initialized)
@@ -202,8 +266,26 @@ int tr2_dst_get_trace_fd(struct tr2_dst *dst)
 		return dst->fd;
 	}
 
-	if (is_absolute_path(tgt_value))
-		return tr2_dst_try_path(dst, tgt_value);
+	if (is_absolute_path(tgt_value)) {
+		int fd;
+		struct strbuf modified_path = STRBUF_INIT;
+
+		tr2_replace_timestamp_templates(tgt_value, &modified_path);
+
+		/*
+		 * Randomize the path if it is an existing directory, or if the
+		 * path does not exist and ends with '-'.
+		 */
+		if (is_directory(modified_path.buf) ||
+		    (stat(modified_path.buf, &st) == -1 && errno == ENOENT &&
+		     modified_path.buf[modified_path.len - 1] == '-'))
+			fd = tr2_dst_try_random_path(dst, &modified_path);
+		else
+			fd = tr2_dst_try_path(dst, modified_path.buf);
+
+		strbuf_release(&modified_path);
+		return fd;
+	}
 
 #ifndef NO_UNIX_SOCKETS
 	if (starts_with(tgt_value, PREFIX_AF_UNIX))
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-13 23:33 ` [PATCH 2/2] trace2: randomize/timestamp trace2 targets Josh Steadmon
@ 2019-03-13 23:49   ` Ævar Arnfjörð Bjarmason
  2019-03-15 18:39     ` Jeff Hostetler
  2019-03-15 20:43     ` Josh Steadmon
  2019-03-14  0:16   ` Jeff King
  1 sibling, 2 replies; 34+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-13 23:49 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, git


On Thu, Mar 14 2019, Josh Steadmon wrote:

> When the value of a trace2 environment variable contains instances of
> the string "%ISO8601%", expand them into the current UTC timestamp in
> ISO 8601 format.

Any reason not to just support feeding the path to strbuf_addftime(), to
e.g. support a daily/hourly log?

> When the value of a trace2 environment variable is an absolute path
> referring to an existing directory, write output to randomly-named
> files under the given directory. If the value is an absolute path
> referring to a non-existent file and ends with a dash, use the value as
> a prefix for randomly named files.
>
> The random filenames will consist of the value of the environment
> variable (after potential timestamp expansion), followed by a 6
> character random string such as would be produced by mkstemp(3).
>
> This makes it more convenient to collect traces for every git
> invocation by unconditionally setting the relevant trace2 envvar to a
> constant directory name.

Hrm, api-trace2.txt already specifies that the "sid" is going to be
unique, couldn't we just have some mode where we use that?

But then of course when we have nested processes will contain slashes,
so we'd either run into deep nesting or need to munge the slashes, in
which case we might bump against a file length limit (although I haven't
seen process trees deeper than 3-4).

Just to pry about the use-case since I'm doing similar collecting, why
are you finding this easier to process?

With the current O_APPEND semantics you're (unless I've missed
something) guaranteed to get a single process tree in nested order,
whereas with this they'll all end up in separate files and you'll need
to slurp them up, sort the whole thing and stitch it together yourself
without the benefit of stream-parsing it where you can cheat a bit
knowing that e.g. a "reflog expire" entry is always coming after the
corresponding "gc" that invoked it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-13 23:33 ` [PATCH 2/2] trace2: randomize/timestamp trace2 targets Josh Steadmon
  2019-03-13 23:49   ` Ævar Arnfjörð Bjarmason
@ 2019-03-14  0:16   ` Jeff King
  2019-03-14  6:07     ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Jeff King @ 2019-03-14  0:16 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, git

On Wed, Mar 13, 2019 at 04:33:29PM -0700, Josh Steadmon wrote:

> When the value of a trace2 environment variable contains instances of
> the string "%ISO8601%", expand them into the current UTC timestamp in
> ISO 8601 format.

This definitely seems useful. Could we drop the final "%" and make it
either a single-character "%t" or "%(iso8601)" to match our other
formatting strings? There's no _technical_ reason to do that, but it
just seems like there's not much point in being unnecessarily
inconsistent.

> When the value of a trace2 environment variable is an absolute path
> referring to an existing directory, write output to randomly-named
> files under the given directory. If the value is an absolute path
> referring to a non-existent file and ends with a dash, use the value as
> a prefix for randomly named files.
> 
> The random filenames will consist of the value of the environment
> variable (after potential timestamp expansion), followed by a 6
> character random string such as would be produced by mkstemp(3).

Doing this automatically for directories feels kind of magical. I'd have
expected it to be just another placeholder. And in fact, I'd think using
the process id as a placeholder would be pretty common. Between
timestamp and pid, that's generally unique (though I'm not opposed to
the random placeholder if somebody really wants to be thorough).

That leaves the door open for being able to append to or overwrite
existing trace files later, if we want to make that a possibility.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-14  0:16   ` Jeff King
@ 2019-03-14  6:07     ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2019-03-14  6:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Josh Steadmon, git, git

Jeff King <peff@peff.net> writes:

> This definitely seems useful. Could we drop the final "%" and make it
> either a single-character "%t" or "%(iso8601)" to match our other
> formatting strings? There's no _technical_ reason to do that, but it
> just seems like there's not much point in being unnecessarily
> inconsistent.

I do agree that %TOKEN% is not among the patterns we've used before,
and I fully anticipated that it would attract paintbrushes when the
proposal hits the public list ;-)

A big question is what we want to be consisten with, though.  From
the readability point-of-view, for-each-ref language %(token) is
easier to read and extend.  A secondary question is that there
likely are things other than the timestamp of the time process
started that may want to be interpolated, so we may want to pick
some useful vocabulary upfront.  If we can declare interpolation
will be done ONLY for timestamps, as Ævar suggests, taking strftime
pattern directly from the caller may be sufficiently simple to
explain and useful, but I am not sure "only for timestamps" is a
limitation we would want to adopt this early in the design process.

> Doing this automatically for directories feels kind of magical. I'd have
> expected it to be just another placeholder. And in fact, I'd think using
> the process id as a placeholder would be pretty common. Between
> timestamp and pid, that's generally unique (though I'm not opposed to
> the random placeholder if somebody really wants to be thorough).
>
> That leaves the door open for being able to append to or overwrite
> existing trace files later, if we want to make that a possibility.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/2] Randomize / timestamp trace2 targets
  2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
  2019-03-13 23:33 ` [PATCH 1/2] date: make get_time() public Josh Steadmon
  2019-03-13 23:33 ` [PATCH 2/2] trace2: randomize/timestamp trace2 targets Josh Steadmon
@ 2019-03-14 14:34 ` Johannes Schindelin
  2019-03-15 20:37   ` Josh Steadmon
  2019-03-15 19:18 ` Jeff Hostetler
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2019-03-14 14:34 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, git

Hi Josh,

On Wed, 13 Mar 2019, Josh Steadmon wrote:

> Persistently enabling trace2 output is difficult because it requires
> specifying a full filename. This series teaches tr2_dst_get_trace_fd()
> to randomize filenames when a directory or filename prefix are given as
> targets in the GIT_TR2_* envvars. It also allows expansion of a
> timestamp template string into the current UTC timestamp.

Given the problem you try to solve, would it not make more sense to use
`getnanotime()` than some randomized file name or an ISO date format?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-13 23:49   ` Ævar Arnfjörð Bjarmason
@ 2019-03-15 18:39     ` Jeff Hostetler
  2019-03-15 19:26       ` Ævar Arnfjörð Bjarmason
  2019-03-15 20:43     ` Josh Steadmon
  1 sibling, 1 reply; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-15 18:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Josh Steadmon; +Cc: git



On 3/13/2019 7:49 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Mar 14 2019, Josh Steadmon wrote:
> 
>> When the value of a trace2 environment variable contains instances of
>> the string "%ISO8601%", expand them into the current UTC timestamp in
>> ISO 8601 format.
> 
> Any reason not to just support feeding the path to strbuf_addftime(), to
> e.g. support a daily/hourly log?
> 
>> When the value of a trace2 environment variable is an absolute path
>> referring to an existing directory, write output to randomly-named
>> files under the given directory. If the value is an absolute path
>> referring to a non-existent file and ends with a dash, use the value as
>> a prefix for randomly named files.
>>
>> The random filenames will consist of the value of the environment
>> variable (after potential timestamp expansion), followed by a 6
>> character random string such as would be produced by mkstemp(3).
>>
>> This makes it more convenient to collect traces for every git
>> invocation by unconditionally setting the relevant trace2 envvar to a
>> constant directory name.
> 
> Hrm, api-trace2.txt already specifies that the "sid" is going to be
> unique, couldn't we just have some mode where we use that?
> 
> But then of course when we have nested processes will contain slashes,
> so we'd either run into deep nesting or need to munge the slashes, in
> which case we might bump against a file length limit (although I haven't
> seen process trees deeper than 3-4).

Using the "sid" would be a good place to start.  Just take the final
component in the string (after the last slash or the whole sid if there
are no slashes).  That will give you a filename with microseconds since
epoch of the command's start time and the PID.

That should be unique, should not require random strings, and not go
deep in the filesystem.  And it will let you correlate files between
child and parent commands, if you need to.

So maybe if GIT_TR2_* is set to a directory, we append the final portion
of the "sid" and create a file inside that directory.

> 
> Just to pry about the use-case since I'm doing similar collecting, why
> are you finding this easier to process?
> 
> With the current O_APPEND semantics you're (unless I've missed
> something) guaranteed to get a single process tree in nested order,
> whereas with this they'll all end up in separate files and you'll need
> to slurp them up, sort the whole thing and stitch it together yourself
> without the benefit of stream-parsing it where you can cheat a bit
> knowing that e.g. a "reflog expire" entry is always coming after the
> corresponding "gc" that invoked it.
> 

Yes, with O_APPEND, you should get a series of events as they happen
on the system all properly interleaved.  And see concurrent activity.
This file should let you grep to see individual processes if you want
to.

Routing each command to a different file is fine if you want, but
that opens you up to having to manage and delete them.

Whether to have 1 file (with occasional rotation) or 1 file-per-command
depends, I guess, on how you want to process them.

I'm routing the Trace2 data to a named-pipe/socket and have a daemon
collecting and filtering, so I have a single pathname for output and
yet get the per-file stream handling that I think Josh is looking for.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/2] Randomize / timestamp trace2 targets
  2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
                   ` (2 preceding siblings ...)
  2019-03-14 14:34 ` [PATCH 0/2] Randomize / timestamp " Johannes Schindelin
@ 2019-03-15 19:18 ` Jeff Hostetler
  2019-03-15 20:38   ` Josh Steadmon
  2019-03-21  0:16 ` [PATCH v2 0/1] Write trace2 output to directories Josh Steadmon
  2019-03-21 21:09 ` [PATCH v3 0/1] Write trace2 output to directories Josh Steadmon
  5 siblings, 1 reply; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-15 19:18 UTC (permalink / raw)
  To: Josh Steadmon, git; +Cc: Ævar Arnfjörð Bjarmason



On 3/13/2019 7:33 PM, Josh Steadmon wrote:
> Persistently enabling trace2 output is difficult because it requires
> specifying a full filename. This series teaches tr2_dst_get_trace_fd()
> to randomize filenames when a directory or filename prefix are given as
> targets in the GIT_TR2_* envvars. It also allows expansion of a
> timestamp template string into the current UTC timestamp.


Does the use of envvar cause issues?  Or is it just the fixed absolute
pathname?  When I started this, I was trying to keep the GIT_TRACE
model.

As was briefly discussed in [1] I was thinking of adding a way to
have a personal and/or system setting, but not a git config variable,
that would let you setup tracing without relying on environment
variables.

It's something that I've been wanting to do, it just got buried by
$DAYJOB stuff.  I can try to bump it up if there's interest.

Thanks
Jeff

[1] 
https://public-inbox.org/git/pull.108.v4.git.gitgitgadget@gmail.com/T/#mcce3b088aabc42e9867f8a9e09dd7dea5f9a773a

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-15 18:39     ` Jeff Hostetler
@ 2019-03-15 19:26       ` Ævar Arnfjörð Bjarmason
  2019-03-15 20:14         ` Jeff Hostetler
  0 siblings, 1 reply; 34+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-15 19:26 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Josh Steadmon, git


On Fri, Mar 15 2019, Jeff Hostetler wrote:

> On 3/13/2019 7:49 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Mar 14 2019, Josh Steadmon wrote:
>>
>>> When the value of a trace2 environment variable contains instances of
>>> the string "%ISO8601%", expand them into the current UTC timestamp in
>>> ISO 8601 format.
>>
>> Any reason not to just support feeding the path to strbuf_addftime(), to
>> e.g. support a daily/hourly log?
>>
>>> When the value of a trace2 environment variable is an absolute path
>>> referring to an existing directory, write output to randomly-named
>>> files under the given directory. If the value is an absolute path
>>> referring to a non-existent file and ends with a dash, use the value as
>>> a prefix for randomly named files.
>>>
>>> The random filenames will consist of the value of the environment
>>> variable (after potential timestamp expansion), followed by a 6
>>> character random string such as would be produced by mkstemp(3).
>>>
>>> This makes it more convenient to collect traces for every git
>>> invocation by unconditionally setting the relevant trace2 envvar to a
>>> constant directory name.
>>
>> Hrm, api-trace2.txt already specifies that the "sid" is going to be
>> unique, couldn't we just have some mode where we use that?
>>
>> But then of course when we have nested processes will contain slashes,
>> so we'd either run into deep nesting or need to munge the slashes, in
>> which case we might bump against a file length limit (although I haven't
>> seen process trees deeper than 3-4).
>
> Using the "sid" would be a good place to start.  Just take the final
> component in the string (after the last slash or the whole sid if there
> are no slashes).  That will give you a filename with microseconds since
> epoch of the command's start time and the PID.
>
> That should be unique, should not require random strings, and not go
> deep in the filesystem.  And it will let you correlate files between
> child and parent commands, if you need to.
>
> So maybe if GIT_TR2_* is set to a directory, we append the final portion
> of the "sid" and create a file inside that directory.
>
>>
>> Just to pry about the use-case since I'm doing similar collecting, why
>> are you finding this easier to process?
>>
>> With the current O_APPEND semantics you're (unless I've missed
>> something) guaranteed to get a single process tree in nested order,
>> whereas with this they'll all end up in separate files and you'll need
>> to slurp them up, sort the whole thing and stitch it together yourself
>> without the benefit of stream-parsing it where you can cheat a bit
>> knowing that e.g. a "reflog expire" entry is always coming after the
>> corresponding "gc" that invoked it.
>>
>
> Yes, with O_APPEND, you should get a series of events as they happen
> on the system all properly interleaved.  And see concurrent activity.
> This file should let you grep to see individual processes if you want
> to.
>
> Routing each command to a different file is fine if you want, but
> that opens you up to having to manage and delete them.
>
> Whether to have 1 file (with occasional rotation) or 1 file-per-command
> depends, I guess, on how you want to process them.
>
> I'm routing the Trace2 data to a named-pipe/socket and have a daemon
> collecting and filtering, so I have a single pathname for output and
> yet get the per-file stream handling that I think Josh is looking for.

Is the collecting code something you can share & general enough that it
might be useful for others?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-15 19:26       ` Ævar Arnfjörð Bjarmason
@ 2019-03-15 20:14         ` Jeff Hostetler
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-15 20:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Josh Steadmon, git



On 3/15/2019 3:26 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 15 2019, Jeff Hostetler wrote:
> 
[...]
>>
>> I'm routing the Trace2 data to a named-pipe/socket and have a daemon
>> collecting and filtering, so I have a single pathname for output and
>> yet get the per-file stream handling that I think Josh is looking for.
> 
> Is the collecting code something you can share & general enough that it
> might be useful for others?
> 

Yes, we're currently talking about releasing the source for that too.
It probably won't happen this month, maybe early next quarter.

Jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/2] Randomize / timestamp trace2 targets
  2019-03-14 14:34 ` [PATCH 0/2] Randomize / timestamp " Johannes Schindelin
@ 2019-03-15 20:37   ` Josh Steadmon
  0 siblings, 0 replies; 34+ messages in thread
From: Josh Steadmon @ 2019-03-15 20:37 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, git

On 2019.03.14 15:34, Johannes Schindelin wrote:
> Hi Josh,
> 
> On Wed, 13 Mar 2019, Josh Steadmon wrote:
> 
> > Persistently enabling trace2 output is difficult because it requires
> > specifying a full filename. This series teaches tr2_dst_get_trace_fd()
> > to randomize filenames when a directory or filename prefix are given as
> > targets in the GIT_TR2_* envvars. It also allows expansion of a
> > timestamp template string into the current UTC timestamp.
> 
> Given the problem you try to solve, would it not make more sense to use
> `getnanotime()` than some randomized file name or an ISO date format?

For my specific use case, getnanotime() would probably be sufficient.
However, others requested the timestamp feature before I sent this to
the list. I don't know whether they had a specific use case in mind.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/2] Randomize / timestamp trace2 targets
  2019-03-15 19:18 ` Jeff Hostetler
@ 2019-03-15 20:38   ` Josh Steadmon
  2019-03-18 12:50     ` Jeff Hostetler
  0 siblings, 1 reply; 34+ messages in thread
From: Josh Steadmon @ 2019-03-15 20:38 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, Ævar Arnfjörð Bjarmason

On 2019.03.15 15:18, Jeff Hostetler wrote:
> 
> 
> On 3/13/2019 7:33 PM, Josh Steadmon wrote:
> > Persistently enabling trace2 output is difficult because it requires
> > specifying a full filename. This series teaches tr2_dst_get_trace_fd()
> > to randomize filenames when a directory or filename prefix are given as
> > targets in the GIT_TR2_* envvars. It also allows expansion of a
> > timestamp template string into the current UTC timestamp.
> 
> 
> Does the use of envvar cause issues?  Or is it just the fixed absolute
> pathname?  When I started this, I was trying to keep the GIT_TRACE
> model.
> 
> As was briefly discussed in [1] I was thinking of adding a way to
> have a personal and/or system setting, but not a git config variable,
> that would let you setup tracing without relying on environment
> variables.
> 
> It's something that I've been wanting to do, it just got buried by
> $DAYJOB stuff.  I can try to bump it up if there's interest.
> 
> Thanks
> Jeff
> 
> [1] https://public-inbox.org/git/pull.108.v4.git.gitgitgadget@gmail.com/T/#mcce3b088aabc42e9867f8a9e09dd7dea5f9a773a

Yeah in our case it's the fixed filename that's the issue. It's more
convenient for us to have lots of small files that are "finished"
relatively quickly, rather than one that is constantly appended to.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-13 23:49   ` Ævar Arnfjörð Bjarmason
  2019-03-15 18:39     ` Jeff Hostetler
@ 2019-03-15 20:43     ` Josh Steadmon
  2019-03-15 20:49       ` Josh Steadmon
  1 sibling, 1 reply; 34+ messages in thread
From: Josh Steadmon @ 2019-03-15 20:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, git

On 2019.03.14 00:49, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Mar 14 2019, Josh Steadmon wrote:
> 
> > When the value of a trace2 environment variable contains instances of
> > the string "%ISO8601%", expand them into the current UTC timestamp in
> > ISO 8601 format.
> 
> Any reason not to just support feeding the path to strbuf_addftime(), to
> e.g. support a daily/hourly log?

No reason not to. Seems reasonable to me.

> > When the value of a trace2 environment variable is an absolute path
> > referring to an existing directory, write output to randomly-named
> > files under the given directory. If the value is an absolute path
> > referring to a non-existent file and ends with a dash, use the value as
> > a prefix for randomly named files.
> >
> > The random filenames will consist of the value of the environment
> > variable (after potential timestamp expansion), followed by a 6
> > character random string such as would be produced by mkstemp(3).
> >
> > This makes it more convenient to collect traces for every git
> > invocation by unconditionally setting the relevant trace2 envvar to a
> > constant directory name.
> 
> Hrm, api-trace2.txt already specifies that the "sid" is going to be
> unique, couldn't we just have some mode where we use that?
> 
> But then of course when we have nested processes will contain slashes,
> so we'd either run into deep nesting or need to munge the slashes, in
> which case we might bump against a file length limit (although I haven't
> seen process trees deeper than 3-4).
> 
> Just to pry about the use-case since I'm doing similar collecting, why
> are you finding this easier to process?

Basically, our collection setup prefers smaller files that are
"finished" earlier, rather than long-lived files that are constantly
appended to.

> With the current O_APPEND semantics you're (unless I've missed
> something) guaranteed to get a single process tree in nested order,
> whereas with this they'll all end up in separate files and you'll need
> to slurp them up, sort the whole thing and stitch it together yourself
> without the benefit of stream-parsing it where you can cheat a bit
> knowing that e.g. a "reflog expire" entry is always coming after the
> corresponding "gc" that invoked it.

Yeah, that is not an issue for us, although I can see why others would
prefer single file. I suppose we can just modify the envvar to point to
our newly-generated file before we spawn any child processes?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-15 20:43     ` Josh Steadmon
@ 2019-03-15 20:49       ` Josh Steadmon
  2019-03-18  1:40         ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Josh Steadmon @ 2019-03-15 20:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git, git

On 2019.03.15 13:43, Josh Steadmon wrote:
> On 2019.03.14 00:49, Ævar Arnfjörð Bjarmason wrote:
> > 
> > On Thu, Mar 14 2019, Josh Steadmon wrote:
> > 
> > > When the value of a trace2 environment variable contains instances of
> > > the string "%ISO8601%", expand them into the current UTC timestamp in
> > > ISO 8601 format.
> > 
> > Any reason not to just support feeding the path to strbuf_addftime(), to
> > e.g. support a daily/hourly log?
> 
> No reason not to. Seems reasonable to me.

Although as Junio says elsewhere in this thread, it's possible that we
may want to support fields other than timestamps.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-15 20:49       ` Josh Steadmon
@ 2019-03-18  1:40         ` Junio C Hamano
  2019-03-19  3:17           ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2019-03-18  1:40 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Ævar Arnfjörð Bjarmason, git, git

Josh Steadmon <steadmon@google.com> writes:

> On 2019.03.15 13:43, Josh Steadmon wrote:
>> On 2019.03.14 00:49, Ævar Arnfjörð Bjarmason wrote:
>> > 
>> > On Thu, Mar 14 2019, Josh Steadmon wrote:
>> > 
>> > > When the value of a trace2 environment variable contains instances of
>> > > the string "%ISO8601%", expand them into the current UTC timestamp in
>> > > ISO 8601 format.
>> > 
>> > Any reason not to just support feeding the path to strbuf_addftime(), to
>> > e.g. support a daily/hourly log?
>> 
>> No reason not to. Seems reasonable to me.
>
> Although as Junio says elsewhere in this thread, it's possible that we
> may want to support fields other than timestamps.

Yup.

On the other hand.

After seeing that the possibilities got discussed on the list, and
that nobody seems to be very much demanding customizability (I am
taking Ævar's mention of strftime as a mere "if we were doing an
optional timestamp, why not do so in an even more customizable way?"
nice-to-have, not as a "we must allow hourly or daily log, adjusting
for each host's needs" must-have), I actually am fine if we declare
that we've chosen the hard-coded "if it is a directory, use the last
portion of sid to create with O_EXCL (and if we fail, append a '.%d'
counter to retry)" or something simple.  Which I think takes us
closer to your earlier and unpublished draft, but this time we can
say that we omitted customizability after making sure that there is
not much interest---so I think it was worth it.

People who really want customizability can and are welcome to argue
otherwise and then I may change my assessment of the level of
interest in customizability, but the above is my current feeling.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/2] Randomize / timestamp trace2 targets
  2019-03-15 20:38   ` Josh Steadmon
@ 2019-03-18 12:50     ` Jeff Hostetler
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-18 12:50 UTC (permalink / raw)
  To: Josh Steadmon, git, Ævar Arnfjörð Bjarmason



On 3/15/2019 4:38 PM, Josh Steadmon wrote:
> On 2019.03.15 15:18, Jeff Hostetler wrote:
>>
>>
>> On 3/13/2019 7:33 PM, Josh Steadmon wrote:
>>> Persistently enabling trace2 output is difficult because it requires
>>> specifying a full filename. This series teaches tr2_dst_get_trace_fd()
>>> to randomize filenames when a directory or filename prefix are given as
>>> targets in the GIT_TR2_* envvars. It also allows expansion of a
>>> timestamp template string into the current UTC timestamp.
>>
>>
>> Does the use of envvar cause issues?  Or is it just the fixed absolute
>> pathname?  When I started this, I was trying to keep the GIT_TRACE
>> model.
>>
>> As was briefly discussed in [1] I was thinking of adding a way to
>> have a personal and/or system setting, but not a git config variable,
>> that would let you setup tracing without relying on environment
>> variables.
>>
>> It's something that I've been wanting to do, it just got buried by
>> $DAYJOB stuff.  I can try to bump it up if there's interest.
>>
>> Thanks
>> Jeff
>>
>> [1] https://public-inbox.org/git/pull.108.v4.git.gitgitgadget@gmail.com/T/#mcce3b088aabc42e9867f8a9e09dd7dea5f9a773a
> 
> Yeah in our case it's the fixed filename that's the issue. It's more
> convenient for us to have lots of small files that are "finished"
> relatively quickly, rather than one that is constantly appended to.
> 

Ok. Great.  So if you want to go ahead and do the "if it's a directory
append the last component of the sid" thing, I'll wait for that and
then add the envvar alternative being discussed in another fork of this
thread.

Thanks
Jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] trace2: randomize/timestamp trace2 targets
  2019-03-18  1:40         ` Junio C Hamano
@ 2019-03-19  3:17           ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2019-03-19  3:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Josh Steadmon, Ævar Arnfjörð Bjarmason, git, git

On Mon, Mar 18, 2019 at 10:40:00AM +0900, Junio C Hamano wrote:

> After seeing that the possibilities got discussed on the list, and
> that nobody seems to be very much demanding customizability (I am
> taking Ævar's mention of strftime as a mere "if we were doing an
> optional timestamp, why not do so in an even more customizable way?"
> nice-to-have, not as a "we must allow hourly or daily log, adjusting
> for each host's needs" must-have), I actually am fine if we declare
> that we've chosen the hard-coded "if it is a directory, use the last
> portion of sid to create with O_EXCL (and if we fail, append a '.%d'
> counter to retry)" or something simple.  Which I think takes us
> closer to your earlier and unpublished draft, but this time we can
> say that we omitted customizability after making sure that there is
> not much interest---so I think it was worth it.
> 
> People who really want customizability can and are welcome to argue
> otherwise and then I may change my assessment of the level of
> interest in customizability, but the above is my current feeling.

I do not really care that much about this particular issue (and I
haven't even really use trace2 for anything yet). My main concern was
just painting ourselves into a corner, and making things explicit rather
than implicit helps with that (i.e., having the user give us a
placeholder that tells us what to do instead of selecting one of several
reasonable behaviors based on whether the path exists).

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 0/1] Write trace2 output to directories
  2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
                   ` (3 preceding siblings ...)
  2019-03-15 19:18 ` Jeff Hostetler
@ 2019-03-21  0:16 ` Josh Steadmon
  2019-03-21  0:16   ` [PATCH v2 1/1] trace2: write to directory targets Josh Steadmon
  2019-03-21 21:09 ` [PATCH v3 0/1] Write trace2 output to directories Josh Steadmon
  5 siblings, 1 reply; 34+ messages in thread
From: Josh Steadmon @ 2019-03-21  0:16 UTC (permalink / raw)
  To: git, git, avarab, gitster, peff, Johannes.Schindelin

Persistently enabling trace2 output is difficult because it requires
specifying a full filename. This series teaches tr2_dst_get_trace_fd()
to create files underneath a given directory provided as the target of
the GIT_TR2_* envvars.

Changes since V1:

* No longer supports timestamp templates.
* No longer supports filename prefixes.
* Always creates filenames based on the final component of the trace2
  SID.

Josh Steadmon (1):
  trace2: write to directory targets

 Documentation/technical/api-trace2.txt |  5 +++
 t/t0210-trace2-normal.sh               | 15 +++++++
 trace2/tr2_dst.c                       | 61 +++++++++++++++++++++++++-
 3 files changed, 79 insertions(+), 2 deletions(-)

Range-diff against v1:
1:  e6b01ad4bf < -:  ---------- date: make get_time() public
2:  17ec237ba7 < -:  ---------- trace2: randomize/timestamp trace2 targets
-:  ---------- > 1:  59d8c6511b trace2: write to directory targets
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/1] trace2: write to directory targets
  2019-03-21  0:16 ` [PATCH v2 0/1] Write trace2 output to directories Josh Steadmon
@ 2019-03-21  0:16   ` Josh Steadmon
  2019-03-21  2:04     ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Josh Steadmon @ 2019-03-21  0:16 UTC (permalink / raw)
  To: git, git, avarab, gitster, peff, Johannes.Schindelin

When the value of a trace2 environment variable is an absolute path
referring to an existing directory, write output to files (one per
process) underneath the given directory. Files will be named according
to the final component of the trace2 SID, followed by a counter to avoid
potential collisions.

This makes it more convenient to collect traces for every git invocation
by unconditionally setting the relevant trace2 envvar to a constant
directory name.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt |  5 +++
 t/t0210-trace2-normal.sh               | 15 +++++++
 trace2/tr2_dst.c                       | 61 +++++++++++++++++++++++++-
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 2de565fa3d..318673e8e5 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -109,6 +109,11 @@ values are recognized.
 
 	Enables the target, opens and writes to the file in append mode.
 
+	If the target already exists and is a directory, the traces will be
+	written to files (one per process) underneath the given directory. They
+	will be named according to the last component of the SID followed by a
+	counter to avoid potential collisions.
+
 `af_unix:[<socket_type>:]<absolute-pathname>`::
 
 	Enables the target, opens and writes to a Unix Domain Socket
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index 03a0aedb1d..26c9c1b3b8 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -80,6 +80,21 @@ test_expect_success 'normal stream, return code 1' '
 	test_cmp expect actual
 '
 
+test_expect_success 'randomized filename' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir traces &&
+	GIT_TR2="$(pwd)/traces" test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <"$(ls traces/*)" >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
 # Verb 002exit
 #
 # Explicit exit(code) from within cmd_<verb> propagates <code>.
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index fd490a43ad..0e752914dc 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "trace2/tr2_dst.h"
+#include "trace2/tr2_sid.h"
 
 /*
  * If a Trace2 target cannot be opened for writing, we should issue a
@@ -12,6 +13,11 @@
  */
 #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"
 
+/*
+ * How many attempts we will make at creating a random trace output path.
+ */
+#define MAX_RANDOM_ATTEMPTS 10
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -36,6 +42,53 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+static int tr2_dst_try_random_path(struct tr2_dst *dst, const char *tgt_prefix)
+{
+	int fd;
+	const char *last_slash, *sid = tr2_sid_get();
+	struct strbuf base_path = STRBUF_INIT, final_path = STRBUF_INIT;
+	unsigned attempt_count;
+
+	last_slash = strrchr(sid, '/');
+	if (last_slash)
+		sid = last_slash + 1;
+
+	strbuf_addstr(&base_path, tgt_prefix);
+	if (!is_dir_sep(base_path.buf[base_path.len - 1]))
+		strbuf_addch(&base_path, '/');
+	strbuf_addstr(&base_path, sid);
+
+	for (attempt_count = 0; attempt_count < MAX_RANDOM_ATTEMPTS; attempt_count++) {
+		strbuf_reset(&final_path);
+		strbuf_addbuf(&final_path, &base_path);
+		strbuf_addf(&final_path, ".%d", attempt_count);
+
+		fd = open(final_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
+		if (fd != -1)
+			break;
+	}
+
+	if (fd == -1) {
+		if (tr2_dst_want_warning())
+			warning("trace2: could not open '%s' for '%s' tracing: %s",
+				base_path.buf, dst->env_var_name, strerror(errno));
+
+		tr2_dst_trace_disable(dst);
+		strbuf_release(&base_path);
+		strbuf_release(&final_path);
+		return 0;
+	}
+
+	strbuf_release(&base_path);
+	strbuf_release(&final_path);
+
+	dst->fd = fd;
+	dst->need_close = 1;
+	dst->initialized = 1;
+
+	return dst->fd;
+}
+
 static int tr2_dst_try_path(struct tr2_dst *dst, const char *tgt_value)
 {
 	int fd = open(tgt_value, O_WRONLY | O_APPEND | O_CREAT, 0666);
@@ -202,8 +255,12 @@ int tr2_dst_get_trace_fd(struct tr2_dst *dst)
 		return dst->fd;
 	}
 
-	if (is_absolute_path(tgt_value))
-		return tr2_dst_try_path(dst, tgt_value);
+	if (is_absolute_path(tgt_value)) {
+		if (is_directory(tgt_value))
+			return tr2_dst_try_random_path(dst, tgt_value);
+		else
+			return tr2_dst_try_path(dst, tgt_value);
+	}
 
 #ifndef NO_UNIX_SOCKETS
 	if (starts_with(tgt_value, PREFIX_AF_UNIX))
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] trace2: write to directory targets
  2019-03-21  0:16   ` [PATCH v2 1/1] trace2: write to directory targets Josh Steadmon
@ 2019-03-21  2:04     ` Junio C Hamano
  2019-03-21 17:43       ` Jeff Hostetler
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2019-03-21  2:04 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, git, avarab, peff, Johannes.Schindelin

Josh Steadmon <steadmon@google.com> writes:

> diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> index 03a0aedb1d..26c9c1b3b8 100755
> --- a/t/t0210-trace2-normal.sh
> +++ b/t/t0210-trace2-normal.sh
> @@ -80,6 +80,21 @@ test_expect_success 'normal stream, return code 1' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'randomized filename' '
> +	test_when_finished "rm -r traces actual expect" &&
> +	mkdir traces &&
> +	GIT_TR2="$(pwd)/traces" test-tool trace2 001return 0 &&
> +	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <"$(ls traces/*)" >actual &&

This is cute.

What we want to test for this new feature is that the directory
traces/ that was originally empty now has exactly one readable file,
which was created by producing a trace.

And redirecting from "$(ls traces/*)" would succeed only when there
is exactly one readble file in the directory.  If it has none, or
more than one, the redirection will fail and we'd notice the error.

> +	cat >expect <<-EOF &&
> +		version $V
> +		start _EXE_ trace2 001return 0
> +		cmd_name trace2 (trace2)
> +		exit elapsed:_TIME_ code:0
> +		atexit elapsed:_TIME_ code:0
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  # Verb 002exit
>  #
>  # Explicit exit(code) from within cmd_<verb> propagates <code>.
> diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> index fd490a43ad..0e752914dc 100644
> --- a/trace2/tr2_dst.c
> +++ b/trace2/tr2_dst.c
> @@ -1,5 +1,6 @@
>  #include "cache.h"
>  #include "trace2/tr2_dst.h"
> +#include "trace2/tr2_sid.h"
>  
>  /*
>   * If a Trace2 target cannot be opened for writing, we should issue a
> @@ -12,6 +13,11 @@
>   */
>  #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"
>  
> +/*
> + * How many attempts we will make at creating a random trace output path.
> + */
> +#define MAX_RANDOM_ATTEMPTS 10

With the updated design, randomness is no longer the primary
property of this new feature.  The fact that the names are
automatically assigned is.  It could be that the source of tr2_sid
may (or may not) be some randomness, but the point is that the
caller in this patch does not care how tr2_sid is computed.

I'd call this max-attempts (or max-autopath-attempts, but that is
rather long, and I do not think inside the scope of "tr2_dst" that
is about "destination", there will be anything but the destination
path we'd "attempt" with a reasonable maximum value to compute, so
the "-autopath" clarification would not buy us much)....

>  static int tr2_dst_want_warning(void)
>  {
>  	static int tr2env_dst_debug = -1;
> @@ -36,6 +42,53 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
>  	dst->need_close = 0;
>  }
>  
> +static int tr2_dst_try_random_path(struct tr2_dst *dst, const char *tgt_prefix)

.... and I'd call this s/random/auto/ instead, if I were writing
this patch following the updated design.

> +{
> +	int fd;
> +	const char *last_slash, *sid = tr2_sid_get();
> +	struct strbuf base_path = STRBUF_INIT, final_path = STRBUF_INIT;
> +	unsigned attempt_count;
> +
> +	last_slash = strrchr(sid, '/');
> +	if (last_slash)
> +		sid = last_slash + 1;
> +
> +	strbuf_addstr(&base_path, tgt_prefix);
> +	if (!is_dir_sep(base_path.buf[base_path.len - 1]))
> +		strbuf_addch(&base_path, '/');
> +	strbuf_addstr(&base_path, sid);

I do not think it is such a huge deal, but you can remember the
value of base_path.len at this point and then get rid of the other
strbuf (and copying into it).  As that will leave only one path
variable you need to worry about, you can shorten base_path to just
path if you go that route.

    baselen = path.len;

> +	for (attempt_count = 0; attempt_count < MAX_RANDOM_ATTEMPTS; attempt_count++) {
> +		strbuf_reset(&final_path);
> +		strbuf_addbuf(&final_path, &base_path);
> +		strbuf_addf(&final_path, ".%d", attempt_count);
> +		fd = open(final_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);

If you follow the "get rid of final_path" route, these would become:

    strbuf_setlen(&path, baselen);
    strbuf_addf(&path, ".%d", count);
    fd = open(path.buf, ..., 0666);

> +		if (fd != -1)
> +			break;
> +	}

And that way, you have one fewer strbuf to _release() at the end and
at early exit points.

> +		if (tr2_dst_want_warning())
> +			warning("trace2: could not open '%s' for '%s' tracing: %s",
> +				base_path.buf, dst->env_var_name, strerror(errno));

This would need to become

		warning("trace2: could not open '%.*s' for '%s' tracing: %s",
			path.buf, baselen, dst->env_var_name, strerror(errno));

if we go that route.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] trace2: write to directory targets
  2019-03-21  2:04     ` Junio C Hamano
@ 2019-03-21 17:43       ` Jeff Hostetler
  2019-03-22  3:30         ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-21 17:43 UTC (permalink / raw)
  To: Junio C Hamano, Josh Steadmon; +Cc: git, avarab, peff, Johannes.Schindelin



On 3/20/2019 10:04 PM, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
[...]
>> +/*
>> + * How many attempts we will make at creating a random trace output path.
>> + */
>> +#define MAX_RANDOM_ATTEMPTS 10
> 
> With the updated design, randomness is no longer the primary
> property of this new feature.  The fact that the names are
> automatically assigned is.  It could be that the source of tr2_sid
> may (or may not) be some randomness, but the point is that the
> caller in this patch does not care how tr2_sid is computed.
> 
> I'd call this max-attempts (or max-autopath-attempts, but that is
> rather long, and I do not think inside the scope of "tr2_dst" that
> is about "destination", there will be anything but the destination
> path we'd "attempt" with a reasonable maximum value to compute, so
> the "-autopath" clarification would not buy us much)....
> 
>>   static int tr2_dst_want_warning(void)
>>   {
>>   	static int tr2env_dst_debug = -1;
>> @@ -36,6 +42,53 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
>>   	dst->need_close = 0;
>>   }
>>   
>> +static int tr2_dst_try_random_path(struct tr2_dst *dst, const char *tgt_prefix)
> 
> .... and I'd call this s/random/auto/ instead, if I were writing
> this patch following the updated design.

I agree with Junio WRT the s/random/auto/ suggestions.

[...]
>> +	for (attempt_count = 0; attempt_count < MAX_RANDOM_ATTEMPTS; attempt_count++) {
>> +		strbuf_reset(&final_path);
>> +		strbuf_addbuf(&final_path, &base_path);
>> +		strbuf_addf(&final_path, ".%d", attempt_count);

Since the last component of the SID is already very unique and
we're unlikely to have collisions, maybe change the above line to:

		if (attempt_count > 0)
			strbuf_addf(&final_path, ".%d", attempt_count);

and in reality expect to never have files with the suffix.

Unless, that is, they turned on more than one of GIT_TR2,
GIT_TR2_PERF, or GIT_TR2_EVENT and pointed them at the same
directory, but I'm not sure if I care about that edge case
or not.

>> +		fd = open(final_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
[...]

Nice.  Thanks for looking into this.
Jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 0/1] Write trace2 output to directories
  2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
                   ` (4 preceding siblings ...)
  2019-03-21  0:16 ` [PATCH v2 0/1] Write trace2 output to directories Josh Steadmon
@ 2019-03-21 21:09 ` Josh Steadmon
  2019-03-21 21:09   ` [PATCH v3 1/1] trace2: write to directory targets Josh Steadmon
  2019-03-21 21:16   ` [PATCH v3 0/1] Write trace2 output to directories Jeff Hostetler
  5 siblings, 2 replies; 34+ messages in thread
From: Josh Steadmon @ 2019-03-21 21:09 UTC (permalink / raw)
  To: git, gitster, git

Persistently enabling trace2 output is difficult because it requires
specifying a full filename. This series teaches tr2_dst_get_trace_fd()
to create files underneath a given directory provided as the target of
the GIT_TR2_* envvars.

Changes since V2:
* No longer appends a suffix to filenames on the first creation attempt.
* Renamed function & constant now that randomization is no longer used.
* Simplified strbuf usage.

Changes since V1:
* No longer supports timestamp templates.
* No longer supports filename prefixes.
* Always creates filenames based on the final component of the trace2
  SID.

Josh Steadmon (1):
  trace2: write to directory targets

 Documentation/technical/api-trace2.txt |  5 ++
 t/t0210-trace2-normal.sh               | 15 ++++++
 trace2/tr2_dst.c                       | 63 +++++++++++++++++++++++++-
 3 files changed, 81 insertions(+), 2 deletions(-)

Range-diff against v2:
1:  59d8c6511b ! 1:  ce5258610f trace2: write to directory targets
    @@ -23,8 +23,8 @@
      
     +	If the target already exists and is a directory, the traces will be
     +	written to files (one per process) underneath the given directory. They
    -+	will be named according to the last component of the SID followed by a
    -+	counter to avoid potential collisions.
    ++	will be named according to the last component of the SID (optionally
    ++	followed by a counter to avoid filename collisions).
     +
      `af_unix:[<socket_type>:]<absolute-pathname>`::
      
    @@ -37,7 +37,7 @@
      	test_cmp expect actual
      '
      
    -+test_expect_success 'randomized filename' '
    ++test_expect_success 'automatic filename' '
     +	test_when_finished "rm -r traces actual expect" &&
     +	mkdir traces &&
     +	GIT_TR2="$(pwd)/traces" test-tool trace2 001return 0 &&
    @@ -71,9 +71,9 @@
      #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"
      
     +/*
    -+ * How many attempts we will make at creating a random trace output path.
    ++ * How many attempts we will make at creating an automatically-named trace file.
     + */
    -+#define MAX_RANDOM_ATTEMPTS 10
    ++#define MAX_AUTO_ATTEMPTS 10
     +
      static int tr2_dst_want_warning(void)
      {
    @@ -82,45 +82,47 @@
      	dst->need_close = 0;
      }
      
    -+static int tr2_dst_try_random_path(struct tr2_dst *dst, const char *tgt_prefix)
    ++static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
     +{
     +	int fd;
     +	const char *last_slash, *sid = tr2_sid_get();
    -+	struct strbuf base_path = STRBUF_INIT, final_path = STRBUF_INIT;
    ++	struct strbuf path = STRBUF_INIT;
    ++	size_t base_path_len;
     +	unsigned attempt_count;
     +
     +	last_slash = strrchr(sid, '/');
     +	if (last_slash)
     +		sid = last_slash + 1;
     +
    -+	strbuf_addstr(&base_path, tgt_prefix);
    -+	if (!is_dir_sep(base_path.buf[base_path.len - 1]))
    -+		strbuf_addch(&base_path, '/');
    -+	strbuf_addstr(&base_path, sid);
    ++	strbuf_addstr(&path, tgt_prefix);
    ++	if (!is_dir_sep(path.buf[path.len - 1]))
    ++		strbuf_addch(&path, '/');
    ++	strbuf_addstr(&path, sid);
    ++	base_path_len = path.len;
     +
    -+	for (attempt_count = 0; attempt_count < MAX_RANDOM_ATTEMPTS; attempt_count++) {
    -+		strbuf_reset(&final_path);
    -+		strbuf_addbuf(&final_path, &base_path);
    -+		strbuf_addf(&final_path, ".%d", attempt_count);
    ++	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
    ++		if (attempt_count > 0) {
    ++			strbuf_setlen(&path, base_path_len);
    ++			strbuf_addf(&path, ".%d", attempt_count);
    ++		}
     +
    -+		fd = open(final_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
    ++		fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
     +		if (fd != -1)
     +			break;
     +	}
     +
     +	if (fd == -1) {
     +		if (tr2_dst_want_warning())
    -+			warning("trace2: could not open '%s' for '%s' tracing: %s",
    -+				base_path.buf, dst->env_var_name, strerror(errno));
    ++			warning("trace2: could not open '%.*s' for '%s' tracing: %s",
    ++				(int) base_path_len, path.buf,
    ++				dst->env_var_name, strerror(errno));
     +
     +		tr2_dst_trace_disable(dst);
    -+		strbuf_release(&base_path);
    -+		strbuf_release(&final_path);
    ++		strbuf_release(&path);
     +		return 0;
     +	}
     +
    -+	strbuf_release(&base_path);
    -+	strbuf_release(&final_path);
    ++	strbuf_release(&path);
     +
     +	dst->fd = fd;
     +	dst->need_close = 1;
    @@ -140,7 +142,7 @@
     -		return tr2_dst_try_path(dst, tgt_value);
     +	if (is_absolute_path(tgt_value)) {
     +		if (is_directory(tgt_value))
    -+			return tr2_dst_try_random_path(dst, tgt_value);
    ++			return tr2_dst_try_auto_path(dst, tgt_value);
     +		else
     +			return tr2_dst_try_path(dst, tgt_value);
     +	}
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 1/1] trace2: write to directory targets
  2019-03-21 21:09 ` [PATCH v3 0/1] Write trace2 output to directories Josh Steadmon
@ 2019-03-21 21:09   ` Josh Steadmon
  2019-03-23 20:44     ` Ævar Arnfjörð Bjarmason
  2019-03-21 21:16   ` [PATCH v3 0/1] Write trace2 output to directories Jeff Hostetler
  1 sibling, 1 reply; 34+ messages in thread
From: Josh Steadmon @ 2019-03-21 21:09 UTC (permalink / raw)
  To: git, gitster, git

When the value of a trace2 environment variable is an absolute path
referring to an existing directory, write output to files (one per
process) underneath the given directory. Files will be named according
to the final component of the trace2 SID, followed by a counter to avoid
potential collisions.

This makes it more convenient to collect traces for every git invocation
by unconditionally setting the relevant trace2 envvar to a constant
directory name.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/api-trace2.txt |  5 ++
 t/t0210-trace2-normal.sh               | 15 ++++++
 trace2/tr2_dst.c                       | 63 +++++++++++++++++++++++++-
 3 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 2de565fa3d..d0948ba250 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -109,6 +109,11 @@ values are recognized.
 
 	Enables the target, opens and writes to the file in append mode.
 
+	If the target already exists and is a directory, the traces will be
+	written to files (one per process) underneath the given directory. They
+	will be named according to the last component of the SID (optionally
+	followed by a counter to avoid filename collisions).
+
 `af_unix:[<socket_type>:]<absolute-pathname>`::
 
 	Enables the target, opens and writes to a Unix Domain Socket
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index 03a0aedb1d..819430658b 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -80,6 +80,21 @@ test_expect_success 'normal stream, return code 1' '
 	test_cmp expect actual
 '
 
+test_expect_success 'automatic filename' '
+	test_when_finished "rm -r traces actual expect" &&
+	mkdir traces &&
+	GIT_TR2="$(pwd)/traces" test-tool trace2 001return 0 &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <"$(ls traces/*)" >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 001return 0
+		cmd_name trace2 (trace2)
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
 # Verb 002exit
 #
 # Explicit exit(code) from within cmd_<verb> propagates <code>.
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index fd490a43ad..c3d82ca6a4 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "trace2/tr2_dst.h"
+#include "trace2/tr2_sid.h"
 
 /*
  * If a Trace2 target cannot be opened for writing, we should issue a
@@ -12,6 +13,11 @@
  */
 #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"
 
+/*
+ * How many attempts we will make at creating an automatically-named trace file.
+ */
+#define MAX_AUTO_ATTEMPTS 10
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -36,6 +42,55 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
+{
+	int fd;
+	const char *last_slash, *sid = tr2_sid_get();
+	struct strbuf path = STRBUF_INIT;
+	size_t base_path_len;
+	unsigned attempt_count;
+
+	last_slash = strrchr(sid, '/');
+	if (last_slash)
+		sid = last_slash + 1;
+
+	strbuf_addstr(&path, tgt_prefix);
+	if (!is_dir_sep(path.buf[path.len - 1]))
+		strbuf_addch(&path, '/');
+	strbuf_addstr(&path, sid);
+	base_path_len = path.len;
+
+	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
+		if (attempt_count > 0) {
+			strbuf_setlen(&path, base_path_len);
+			strbuf_addf(&path, ".%d", attempt_count);
+		}
+
+		fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
+		if (fd != -1)
+			break;
+	}
+
+	if (fd == -1) {
+		if (tr2_dst_want_warning())
+			warning("trace2: could not open '%.*s' for '%s' tracing: %s",
+				(int) base_path_len, path.buf,
+				dst->env_var_name, strerror(errno));
+
+		tr2_dst_trace_disable(dst);
+		strbuf_release(&path);
+		return 0;
+	}
+
+	strbuf_release(&path);
+
+	dst->fd = fd;
+	dst->need_close = 1;
+	dst->initialized = 1;
+
+	return dst->fd;
+}
+
 static int tr2_dst_try_path(struct tr2_dst *dst, const char *tgt_value)
 {
 	int fd = open(tgt_value, O_WRONLY | O_APPEND | O_CREAT, 0666);
@@ -202,8 +257,12 @@ int tr2_dst_get_trace_fd(struct tr2_dst *dst)
 		return dst->fd;
 	}
 
-	if (is_absolute_path(tgt_value))
-		return tr2_dst_try_path(dst, tgt_value);
+	if (is_absolute_path(tgt_value)) {
+		if (is_directory(tgt_value))
+			return tr2_dst_try_auto_path(dst, tgt_value);
+		else
+			return tr2_dst_try_path(dst, tgt_value);
+	}
 
 #ifndef NO_UNIX_SOCKETS
 	if (starts_with(tgt_value, PREFIX_AF_UNIX))
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/1] Write trace2 output to directories
  2019-03-21 21:09 ` [PATCH v3 0/1] Write trace2 output to directories Josh Steadmon
  2019-03-21 21:09   ` [PATCH v3 1/1] trace2: write to directory targets Josh Steadmon
@ 2019-03-21 21:16   ` Jeff Hostetler
  2019-03-22  5:23     ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-21 21:16 UTC (permalink / raw)
  To: Josh Steadmon, git, gitster



On 3/21/2019 5:09 PM, Josh Steadmon wrote:
> Persistently enabling trace2 output is difficult because it requires
> specifying a full filename. This series teaches tr2_dst_get_trace_fd()
> to create files underneath a given directory provided as the target of
> the GIT_TR2_* envvars.
> 
> Changes since V2:
> * No longer appends a suffix to filenames on the first creation attempt.
> * Renamed function & constant now that randomization is no longer used.
> * Simplified strbuf usage.
> 
> Changes since V1:
> * No longer supports timestamp templates.
> * No longer supports filename prefixes.
> * Always creates filenames based on the final component of the trace2
>    SID.
> 
> Josh Steadmon (1):
>    trace2: write to directory targets
> 
>   Documentation/technical/api-trace2.txt |  5 ++
>   t/t0210-trace2-normal.sh               | 15 ++++++
>   trace2/tr2_dst.c                       | 63 +++++++++++++++++++++++++-
>   3 files changed, 81 insertions(+), 2 deletions(-)
> 
> Range-diff against v2:
> 1:  59d8c6511b ! 1:  ce5258610f trace2: write to directory targets
>      @@ -23,8 +23,8 @@
>        
>       +	If the target already exists and is a directory, the traces will be
>       +	written to files (one per process) underneath the given directory. They
>      -+	will be named according to the last component of the SID followed by a
>      -+	counter to avoid potential collisions.
>      ++	will be named according to the last component of the SID (optionally
>      ++	followed by a counter to avoid filename collisions).
>       +
>        `af_unix:[<socket_type>:]<absolute-pathname>`::
>        
>      @@ -37,7 +37,7 @@
>        	test_cmp expect actual
>        '
>        
>      -+test_expect_success 'randomized filename' '
>      ++test_expect_success 'automatic filename' '
>       +	test_when_finished "rm -r traces actual expect" &&
>       +	mkdir traces &&
>       +	GIT_TR2="$(pwd)/traces" test-tool trace2 001return 0 &&
>      @@ -71,9 +71,9 @@
>        #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"
>        
>       +/*
>      -+ * How many attempts we will make at creating a random trace output path.
>      ++ * How many attempts we will make at creating an automatically-named trace file.
>       + */
>      -+#define MAX_RANDOM_ATTEMPTS 10
>      ++#define MAX_AUTO_ATTEMPTS 10
>       +
>        static int tr2_dst_want_warning(void)
>        {
>      @@ -82,45 +82,47 @@
>        	dst->need_close = 0;
>        }
>        
>      -+static int tr2_dst_try_random_path(struct tr2_dst *dst, const char *tgt_prefix)
>      ++static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>       +{
>       +	int fd;
>       +	const char *last_slash, *sid = tr2_sid_get();
>      -+	struct strbuf base_path = STRBUF_INIT, final_path = STRBUF_INIT;
>      ++	struct strbuf path = STRBUF_INIT;
>      ++	size_t base_path_len;
>       +	unsigned attempt_count;
>       +
>       +	last_slash = strrchr(sid, '/');
>       +	if (last_slash)
>       +		sid = last_slash + 1;
>       +
>      -+	strbuf_addstr(&base_path, tgt_prefix);
>      -+	if (!is_dir_sep(base_path.buf[base_path.len - 1]))
>      -+		strbuf_addch(&base_path, '/');
>      -+	strbuf_addstr(&base_path, sid);
>      ++	strbuf_addstr(&path, tgt_prefix);
>      ++	if (!is_dir_sep(path.buf[path.len - 1]))
>      ++		strbuf_addch(&path, '/');
>      ++	strbuf_addstr(&path, sid);
>      ++	base_path_len = path.len;
>       +
>      -+	for (attempt_count = 0; attempt_count < MAX_RANDOM_ATTEMPTS; attempt_count++) {
>      -+		strbuf_reset(&final_path);
>      -+		strbuf_addbuf(&final_path, &base_path);
>      -+		strbuf_addf(&final_path, ".%d", attempt_count);
>      ++	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
>      ++		if (attempt_count > 0) {
>      ++			strbuf_setlen(&path, base_path_len);
>      ++			strbuf_addf(&path, ".%d", attempt_count);
>      ++		}
>       +
>      -+		fd = open(final_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
>      ++		fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
>       +		if (fd != -1)
>       +			break;
>       +	}
>       +
>       +	if (fd == -1) {
>       +		if (tr2_dst_want_warning())
>      -+			warning("trace2: could not open '%s' for '%s' tracing: %s",
>      -+				base_path.buf, dst->env_var_name, strerror(errno));
>      ++			warning("trace2: could not open '%.*s' for '%s' tracing: %s",
>      ++				(int) base_path_len, path.buf,
>      ++				dst->env_var_name, strerror(errno));
>       +
>       +		tr2_dst_trace_disable(dst);
>      -+		strbuf_release(&base_path);
>      -+		strbuf_release(&final_path);
>      ++		strbuf_release(&path);
>       +		return 0;
>       +	}
>       +
>      -+	strbuf_release(&base_path);
>      -+	strbuf_release(&final_path);
>      ++	strbuf_release(&path);
>       +
>       +	dst->fd = fd;
>       +	dst->need_close = 1;
>      @@ -140,7 +142,7 @@
>       -		return tr2_dst_try_path(dst, tgt_value);
>       +	if (is_absolute_path(tgt_value)) {
>       +		if (is_directory(tgt_value))
>      -+			return tr2_dst_try_random_path(dst, tgt_value);
>      ++			return tr2_dst_try_auto_path(dst, tgt_value);
>       +		else
>       +			return tr2_dst_try_path(dst, tgt_value);
>       +	}
> 

Looks good.  Thanks.
Jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] trace2: write to directory targets
  2019-03-21 17:43       ` Jeff Hostetler
@ 2019-03-22  3:30         ` Junio C Hamano
  2019-03-22 14:20           ` Jeff Hostetler
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2019-03-22  3:30 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Josh Steadmon, git, avarab, peff, Johannes.Schindelin

Jeff Hostetler <git@jeffhostetler.com> writes:

>>> +	for (attempt_count = 0; attempt_count < MAX_RANDOM_ATTEMPTS; attempt_count++) {
>>> +		strbuf_reset(&final_path);
>>> +		strbuf_addbuf(&final_path, &base_path);
>>> +		strbuf_addf(&final_path, ".%d", attempt_count);
>
> Since the last component of the SID is already very unique and
> we're unlikely to have collisions, maybe change the above line to:
>
> 		if (attempt_count > 0)
> 			strbuf_addf(&final_path, ".%d", attempt_count);
>
> and in reality expect to never have files with the suffix.

That's a nice property.

> Unless, that is, they turned on more than one of GIT_TR2,
> GIT_TR2_PERF, or GIT_TR2_EVENT and pointed them at the same
> directory, but I'm not sure if I care about that edge case
> or not.

That actually makes me wonder if the auto generated filenames want
to have a common trait (e.g. suffix) that allows the readers to tell
from which of these environment variables the names came from.  It
would not be very useful if two files with the same sid component
had .1 suffix for GIT_TR2 trace for one session, and the same .1
suffix is used for GIT_TR2_PERF trace for a pair of files from
another session.

But let's not worry about it for now.  If people do not want them
get mixed up and become hard to tell apart, they can always specify
different directories for different traces.

>
>>> +		fd = open(final_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
> [...]
>
> Nice.  Thanks for looking into this.
> Jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/1] Write trace2 output to directories
  2019-03-21 21:16   ` [PATCH v3 0/1] Write trace2 output to directories Jeff Hostetler
@ 2019-03-22  5:23     ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2019-03-22  5:23 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Josh Steadmon, git

Jeff Hostetler <git@jeffhostetler.com> writes:

> On 3/21/2019 5:09 PM, Josh Steadmon wrote:
>> Persistently enabling trace2 output is difficult because it requires
>> specifying a full filename. This series teaches tr2_dst_get_trace_fd()
>> to create files underneath a given directory provided as the target of
>> the GIT_TR2_* envvars.
> ...
> Looks good.  Thanks.
> Jeff

Thanks, both.  Will queue.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] trace2: write to directory targets
  2019-03-22  3:30         ` Junio C Hamano
@ 2019-03-22 14:20           ` Jeff Hostetler
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-22 14:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git, avarab, peff, Johannes.Schindelin



On 3/21/2019 11:30 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>>>> +	for (attempt_count = 0; attempt_count < MAX_RANDOM_ATTEMPTS; attempt_count++) {
>>>> +		strbuf_reset(&final_path);
>>>> +		strbuf_addbuf(&final_path, &base_path);
>>>> +		strbuf_addf(&final_path, ".%d", attempt_count);
>>
>> Since the last component of the SID is already very unique and
>> we're unlikely to have collisions, maybe change the above line to:
>>
>> 		if (attempt_count > 0)
>> 			strbuf_addf(&final_path, ".%d", attempt_count);
>>
>> and in reality expect to never have files with the suffix.
> 
> That's a nice property.
> 
>> Unless, that is, they turned on more than one of GIT_TR2,
>> GIT_TR2_PERF, or GIT_TR2_EVENT and pointed them at the same
>> directory, but I'm not sure if I care about that edge case
>> or not.
> 
> That actually makes me wonder if the auto generated filenames want
> to have a common trait (e.g. suffix) that allows the readers to tell
> from which of these environment variables the names came from.  It
> would not be very useful if two files with the same sid component
> had .1 suffix for GIT_TR2 trace for one session, and the same .1
> suffix is used for GIT_TR2_PERF trace for a pair of files from
> another session.

I thought about suggesting that, but didn't think it worth the bother.

> But let's not worry about it for now.  If people do not want them
> get mixed up and become hard to tell apart, they can always specify
> different directories for different traces.

agreed. it should be very rare.  the SID is the built from
	"<microseconds-since-the-epoch><dash><pid>"
so something would have to be seriously wrong with their
system to get collision from 2 different git commands.

and yes, they the advise should be to use different directories
for the different streams.

> 
>>
>>>> +		fd = open(final_path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
>> [...]
>>
>> Nice.  Thanks for looking into this.
>> Jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] trace2: write to directory targets
  2019-03-21 21:09   ` [PATCH v3 1/1] trace2: write to directory targets Josh Steadmon
@ 2019-03-23 20:44     ` Ævar Arnfjörð Bjarmason
  2019-03-24 12:33       ` Junio C Hamano
  2019-03-25 16:29       ` Jeff Hostetler
  0 siblings, 2 replies; 34+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-23 20:44 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, gitster, git


On Thu, Mar 21 2019, Josh Steadmon wrote:

> When the value of a trace2 environment variable is an absolute path
> referring to an existing directory, write output to files (one per
> process) underneath the given directory. Files will be named according
> to the final component of the trace2 SID, followed by a counter to avoid
> potential collisions.

Is this "counter to avoid collisions" something you've seen the need to
have in practice, or could we just squash this on top:

    diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
    index c3d82ca6a4..06cbef5837 100644
    --- a/trace2/tr2_dst.c
    +++ b/trace2/tr2_dst.c
    @@ -13,11 +13,6 @@
      */
     #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"

    -/*
    - * How many attempts we will make at creating an automatically-named trace file.
    - */
    -#define MAX_AUTO_ATTEMPTS 10
    -
     static int tr2_dst_want_warning(void)
     {
     	static int tr2env_dst_debug = -1;
    @@ -48,7 +43,6 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
     	const char *last_slash, *sid = tr2_sid_get();
     	struct strbuf path = STRBUF_INIT;
     	size_t base_path_len;
    -	unsigned attempt_count;

     	last_slash = strrchr(sid, '/');
     	if (last_slash)
    @@ -60,17 +54,7 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
     	strbuf_addstr(&path, sid);
     	base_path_len = path.len;

    -	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
    -		if (attempt_count > 0) {
    -			strbuf_setlen(&path, base_path_len);
    -			strbuf_addf(&path, ".%d", attempt_count);
    -		}
    -
    -		fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
    -		if (fd != -1)
    -			break;
    -	}
    -
    +	fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
     	if (fd == -1) {
     		if (tr2_dst_want_warning())
     			warning("trace2: could not open '%.*s' for '%s' tracing: %s",

The reason I'm raising this is that it seems like sweeping an existing
issue under the rug. We document that the "sid" is "unique", and it's just:

    <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>

So that might be a lie, and in particular I can imagine that say if
every machine at Google is logging traces into some magic mounted FS
that there'll be collisions there.

But then let's *fix that*, because we're also e.g. going to have other
consumers of these traces using the sid's as primary keys in a logging
system.

I wonder if we should just make it a bit longer, human-readable, and
include a hash of the hostname:

    perl -MTime::HiRes=gettimeofday -MSys::Hostname -MDigest::SHA=sha1_hex -MPOSIX=strftime -wE '
        my ($t, $m) = gettimeofday;
        my $host_hex = substr sha1_hex(hostname()), 0, 8;
        my $htime = strftime("%Y%m%d%H%M%S", localtime);
        my $sid = sprintf("%s-%6d-%s-%s",
            $htime,
            $m,
            $host_hex,
            $$ & 0xFFFF,
        );
        say $sid;
    '

Which gets you a SID like:

    20190323213918-404788-c2f5b994-19027

I.e.:

    <YYYYMMDDHHMMSS>-<microsecond-offset>-<8 chars of sha1(hostname -f)>-<pid>

There's obviously ways to make that more compact, but in this case I
couldn't see a reason to, also using UTC would be a good idea.

All the trace2 tests pass if I fake that up. Jeff H: Do you have
anything that relies on the current format?

> This makes it more convenient to collect traces for every git invocation
> by unconditionally setting the relevant trace2 envvar to a constant
> directory name.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  Documentation/technical/api-trace2.txt |  5 ++
>  t/t0210-trace2-normal.sh               | 15 ++++++
>  trace2/tr2_dst.c                       | 63 +++++++++++++++++++++++++-
>  3 files changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
> index 2de565fa3d..d0948ba250 100644
> --- a/Documentation/technical/api-trace2.txt
> +++ b/Documentation/technical/api-trace2.txt
> @@ -109,6 +109,11 @@ values are recognized.
>
>  	Enables the target, opens and writes to the file in append mode.
>
> +	If the target already exists and is a directory, the traces will be
> +	written to files (one per process) underneath the given directory. They
> +	will be named according to the last component of the SID (optionally
> +	followed by a counter to avoid filename collisions).
> +
>  `af_unix:[<socket_type>:]<absolute-pathname>`::
>
>  	Enables the target, opens and writes to a Unix Domain Socket
> diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> index 03a0aedb1d..819430658b 100755
> --- a/t/t0210-trace2-normal.sh
> +++ b/t/t0210-trace2-normal.sh
> @@ -80,6 +80,21 @@ test_expect_success 'normal stream, return code 1' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'automatic filename' '
> +	test_when_finished "rm -r traces actual expect" &&
> +	mkdir traces &&
> +	GIT_TR2="$(pwd)/traces" test-tool trace2 001return 0 &&
> +	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <"$(ls traces/*)" >actual &&
> +	cat >expect <<-EOF &&
> +		version $V
> +		start _EXE_ trace2 001return 0
> +		cmd_name trace2 (trace2)
> +		exit elapsed:_TIME_ code:0
> +		atexit elapsed:_TIME_ code:0
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  # Verb 002exit
>  #
>  # Explicit exit(code) from within cmd_<verb> propagates <code>.
> diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> index fd490a43ad..c3d82ca6a4 100644
> --- a/trace2/tr2_dst.c
> +++ b/trace2/tr2_dst.c
> @@ -1,5 +1,6 @@
>  #include "cache.h"
>  #include "trace2/tr2_dst.h"
> +#include "trace2/tr2_sid.h"
>
>  /*
>   * If a Trace2 target cannot be opened for writing, we should issue a
> @@ -12,6 +13,11 @@
>   */
>  #define TR2_ENVVAR_DST_DEBUG "GIT_TR2_DST_DEBUG"
>
> +/*
> + * How many attempts we will make at creating an automatically-named trace file.
> + */
> +#define MAX_AUTO_ATTEMPTS 10
> +
>  static int tr2_dst_want_warning(void)
>  {
>  	static int tr2env_dst_debug = -1;
> @@ -36,6 +42,55 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
>  	dst->need_close = 0;
>  }
>
> +static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
> +{
> +	int fd;
> +	const char *last_slash, *sid = tr2_sid_get();
> +	struct strbuf path = STRBUF_INIT;
> +	size_t base_path_len;
> +	unsigned attempt_count;
> +
> +	last_slash = strrchr(sid, '/');
> +	if (last_slash)
> +		sid = last_slash + 1;
> +
> +	strbuf_addstr(&path, tgt_prefix);
> +	if (!is_dir_sep(path.buf[path.len - 1]))
> +		strbuf_addch(&path, '/');
> +	strbuf_addstr(&path, sid);
> +	base_path_len = path.len;
> +
> +	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
> +		if (attempt_count > 0) {
> +			strbuf_setlen(&path, base_path_len);
> +			strbuf_addf(&path, ".%d", attempt_count);
> +		}
> +
> +		fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, 0666);
> +		if (fd != -1)
> +			break;
> +	}
> +
> +	if (fd == -1) {
> +		if (tr2_dst_want_warning())
> +			warning("trace2: could not open '%.*s' for '%s' tracing: %s",
> +				(int) base_path_len, path.buf,
> +				dst->env_var_name, strerror(errno));
> +
> +		tr2_dst_trace_disable(dst);
> +		strbuf_release(&path);
> +		return 0;
> +	}
> +
> +	strbuf_release(&path);
> +
> +	dst->fd = fd;
> +	dst->need_close = 1;
> +	dst->initialized = 1;
> +
> +	return dst->fd;
> +}
> +
>  static int tr2_dst_try_path(struct tr2_dst *dst, const char *tgt_value)
>  {
>  	int fd = open(tgt_value, O_WRONLY | O_APPEND | O_CREAT, 0666);
> @@ -202,8 +257,12 @@ int tr2_dst_get_trace_fd(struct tr2_dst *dst)
>  		return dst->fd;
>  	}
>
> -	if (is_absolute_path(tgt_value))
> -		return tr2_dst_try_path(dst, tgt_value);
> +	if (is_absolute_path(tgt_value)) {
> +		if (is_directory(tgt_value))
> +			return tr2_dst_try_auto_path(dst, tgt_value);
> +		else
> +			return tr2_dst_try_path(dst, tgt_value);
> +	}
>
>  #ifndef NO_UNIX_SOCKETS
>  	if (starts_with(tgt_value, PREFIX_AF_UNIX))

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] trace2: write to directory targets
  2019-03-23 20:44     ` Ævar Arnfjörð Bjarmason
@ 2019-03-24 12:33       ` Junio C Hamano
  2019-03-24 14:51         ` Ævar Arnfjörð Bjarmason
  2019-03-25 16:29       ` Jeff Hostetler
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2019-03-24 12:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Josh Steadmon, git, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> The reason I'm raising this is that it seems like sweeping an existing
> issue under the rug. We document that the "sid" is "unique", and it's just:
>
>     <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>

If it is just that, then it cannot be unique, can it?

Let's just fix the wrong doc and move on.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] trace2: write to directory targets
  2019-03-24 12:33       ` Junio C Hamano
@ 2019-03-24 14:51         ` Ævar Arnfjörð Bjarmason
  2019-03-25  2:21           ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-24 14:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git, git


On Sun, Mar 24 2019, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> The reason I'm raising this is that it seems like sweeping an existing
>> issue under the rug. We document that the "sid" is "unique", and it's just:
>>
>>     <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>
>
> If it is just that, then it cannot be unique, can it?
>
> Let's just fix the wrong doc and move on.

I don't see why we wouldn't just fix the SID generation & move on if
we're already carrying code purely as a work-around for it not being
unique enough.

Of course nothing is *guaranteed* to be unique, not even a 128-bit
UUID. The point is to pick something that's "unique enough" given the
problem space, which is already one where we'll ignore I/O errors on
writing the file unless you opt-in to a warning.

And it's a very useful property to have the SID be a) unique enough like
that so you can use a unique index for "git events" b) fixed-width
(which the current one *isn't*), so you can use fixed-with indexes in
some logging databases c) as a bonus, be human-readable at a glance,
hence spending slightly more space on the the YYYMM.. format in my
suggested SID format.

I can write that patch, but first it's useful to know if this is a case
Josh is running into, or if it's just a guard out of paranoia.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] trace2: write to directory targets
  2019-03-24 14:51         ` Ævar Arnfjörð Bjarmason
@ 2019-03-25  2:21           ` Junio C Hamano
  2019-03-25  8:21             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2019-03-25  2:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Josh Steadmon, git, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sun, Mar 24 2019, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> The reason I'm raising this is that it seems like sweeping an existing
>>> issue under the rug. We document that the "sid" is "unique", and it's just:
>>>
>>>     <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>
>>
>> If it is just that, then it cannot be unique, can it?
>>
>> Let's just fix the wrong doc and move on.
>
> I don't see why we wouldn't just fix the SID generation & move on if
> we're already carrying code purely as a work-around for it not being
> unique enough.

The thing is, the yardstick to determine "unique enough" depends on
the caller.  In this codepath, we want the uniqueness within the
directory that was given to us, and one reasonable way, among the
most sensible ones, is to ask open(O_CREAT|O_EXCL) and it makes 100%
sense to fall back with suffix when "enough" thought by the callee
turns out to be insufficient when judged by the caller.

So I do not see the .%d suffix a work-around at all.  I view it as
an integral part of the whole package.

By the way, is the "nanotime" implementation that may be in compat/
fine grained enough?

> Of course nothing is *guaranteed* to be unique, not even a 128-bit
> UUID. The point is to pick something that's "unique enough" given the
> problem space, which is already one where we'll ignore I/O errors on
> writing the file unless you opt-in to a warning.

Yes, the point is to pick something that is unique enough and then
give a reasonable fallback when it turns out insufficient.  I think
".%d" suffix is one reasonable fallback, but I realize that it is
not the only reasonable fallback.  Another reasonable fallback could
be "upon seeing a failure of open(O_CREAT|O_EXCL), we give up and do
not leave a logfile, because this should be a rare enough event as
our assumption is sid is unique enough for everyday operation".

I could buy that, especially if the ".%d" suffix fallback is too
expensive to carry and maintain into the future.  And in such a
case, it indeed would be a more reasonable workaround for a rare
non-unique sid problem to ignore and discard the log.

I just did not think the ".%d" suffix fallback is too expensive to
carry.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] trace2: write to directory targets
  2019-03-25  2:21           ` Junio C Hamano
@ 2019-03-25  8:21             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 34+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-25  8:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git, git


On Mon, Mar 25 2019, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Sun, Mar 24 2019, Junio C Hamano wrote:
>>
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>>> The reason I'm raising this is that it seems like sweeping an existing
>>>> issue under the rug. We document that the "sid" is "unique", and it's just:
>>>>
>>>>     <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>
>>>
>>> If it is just that, then it cannot be unique, can it?
>>>
>>> Let's just fix the wrong doc and move on.
>>
>> I don't see why we wouldn't just fix the SID generation & move on if
>> we're already carrying code purely as a work-around for it not being
>> unique enough.
>
> The thing is, the yardstick to determine "unique enough" depends on
> the caller.  In this codepath, we want the uniqueness within the
> directory that was given to us, and one reasonable way, among the
> most sensible ones, is to ask open(O_CREAT|O_EXCL) and it makes 100%
> sense to fall back with suffix when "enough" thought by the callee
> turns out to be insufficient when judged by the caller.
>
> So I do not see the .%d suffix a work-around at all.  I view it as
> an integral part of the whole package.
>
> By the way, is the "nanotime" implementation that may be in compat/
> fine grained enough?
>
>> Of course nothing is *guaranteed* to be unique, not even a 128-bit
>> UUID. The point is to pick something that's "unique enough" given the
>> problem space, which is already one where we'll ignore I/O errors on
>> writing the file unless you opt-in to a warning.
>
> Yes, the point is to pick something that is unique enough and then
> give a reasonable fallback when it turns out insufficient.  I think
> ".%d" suffix is one reasonable fallback, but I realize that it is
> not the only reasonable fallback.  Another reasonable fallback could
> be "upon seeing a failure of open(O_CREAT|O_EXCL), we give up and do
> not leave a logfile, because this should be a rare enough event as
> our assumption is sid is unique enough for everyday operation".
>
> I could buy that, especially if the ".%d" suffix fallback is too
> expensive to carry and maintain into the future.  And in such a
> case, it indeed would be a more reasonable workaround for a rare
> non-unique sid problem to ignore and discard the log.
>
> I just did not think the ".%d" suffix fallback is too expensive to
> carry.

Oh yeah, the SID.%d fallback is easy enough, and I'd 100% agree that if
that was *all* it was this would all make sense. Let's just retry, just
like a tempfile implementation will retry.

But I'm not interested in using the SID-per-file format Josh is adding
here, but *am* interested in having a logging system where over a bunch
of machines I can expect the SID to be unique with a high degree of
probability.

So this open(O_CREAT|O_EXCL) code is revealing a problem that we're also
going to have in a different form in the
GIT_TR2_EVENT=/var/log/one-big-file-of.json format.

It seems to me that the only reason both of these are an issue is
something we can fix with the SID generation.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] trace2: write to directory targets
  2019-03-23 20:44     ` Ævar Arnfjörð Bjarmason
  2019-03-24 12:33       ` Junio C Hamano
@ 2019-03-25 16:29       ` Jeff Hostetler
  1 sibling, 0 replies; 34+ messages in thread
From: Jeff Hostetler @ 2019-03-25 16:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Josh Steadmon; +Cc: git, gitster



On 3/23/2019 4:44 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Mar 21 2019, Josh Steadmon wrote:
> 
>> When the value of a trace2 environment variable is an absolute path
>> referring to an existing directory, write output to files (one per
>> process) underneath the given directory. Files will be named according
>> to the final component of the trace2 SID, followed by a counter to avoid
>> potential collisions.
> 
[...]
> 
> The reason I'm raising this is that it seems like sweeping an existing
> issue under the rug. We document that the "sid" is "unique", and it's just:
> 
>      <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>
> 
> So that might be a lie, and in particular I can imagine that say if
> every machine at Google is logging traces into some magic mounted FS
> that there'll be collisions there.
> 
> But then let's *fix that*, because we're also e.g. going to have other
> consumers of these traces using the sid's as primary keys in a logging
> system.
> 
> I wonder if we should just make it a bit longer, human-readable, and
> include a hash of the hostname:
> 
>      perl -MTime::HiRes=gettimeofday -MSys::Hostname -MDigest::SHA=sha1_hex -MPOSIX=strftime -wE '
>          my ($t, $m) = gettimeofday;
>          my $host_hex = substr sha1_hex(hostname()), 0, 8;
>          my $htime = strftime("%Y%m%d%H%M%S", localtime);
>          my $sid = sprintf("%s-%6d-%s-%s",
>              $htime,
>              $m,
>              $host_hex,
>              $$ & 0xFFFF,
>          );
>          say $sid;
>      '
> 
> Which gets you a SID like:
> 
>      20190323213918-404788-c2f5b994-19027
> 
> I.e.:
> 
>      <YYYYMMDDHHMMSS>-<microsecond-offset>-<8 chars of sha1(hostname -f)>-<pid>
> 
> There's obviously ways to make that more compact, but in this case I
> couldn't see a reason to, also using UTC would be a good idea.
> 
> All the trace2 tests pass if I fake that up. Jeff H: Do you have
> anything that relies on the current format?
I'm using the SID hierarchy to track parent and child processes,
but the actual format of an individual SID-component is mostly a
black box.

I used the microseconds+pid as unique enough.  And events for new
commands will mostly just append to an existing index, rather than
being a random insert like you'd get for a GUID.

I didn't use a GUID here because that seemed overkill and a little
bit more expensive, but perhaps that was just premature optimization
on my part.


So, a new fixed width format like you suggested above would be fine.
I wonder though, if we're moving towards a stronger SID, there's no
reason to keep the PID in it.  Which makes me wonder about the value
of sha(hostname) too.  Perhaps, just make it a GUID or some combination
of the UTC date and a GUID ( <YYMMDDHHMMSS>-<microseconds>-<GUID> ) or
something like that.

If it helps, we can change how I'm reporting the SID between parent
and child processes, so that the SID field in the JSON events is
just the SID of the current process and have a peer field with the
SID-hierarchy.  This latter field would only need to be added to the
"version" or "start" event.  This might make post-processing a little
easier.  Not sure it matters one way or the other.

I'm open to suggestions here.

Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-03-25 16:29 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
2019-03-13 23:33 ` [PATCH 1/2] date: make get_time() public Josh Steadmon
2019-03-13 23:33 ` [PATCH 2/2] trace2: randomize/timestamp trace2 targets Josh Steadmon
2019-03-13 23:49   ` Ævar Arnfjörð Bjarmason
2019-03-15 18:39     ` Jeff Hostetler
2019-03-15 19:26       ` Ævar Arnfjörð Bjarmason
2019-03-15 20:14         ` Jeff Hostetler
2019-03-15 20:43     ` Josh Steadmon
2019-03-15 20:49       ` Josh Steadmon
2019-03-18  1:40         ` Junio C Hamano
2019-03-19  3:17           ` Jeff King
2019-03-14  0:16   ` Jeff King
2019-03-14  6:07     ` Junio C Hamano
2019-03-14 14:34 ` [PATCH 0/2] Randomize / timestamp " Johannes Schindelin
2019-03-15 20:37   ` Josh Steadmon
2019-03-15 19:18 ` Jeff Hostetler
2019-03-15 20:38   ` Josh Steadmon
2019-03-18 12:50     ` Jeff Hostetler
2019-03-21  0:16 ` [PATCH v2 0/1] Write trace2 output to directories Josh Steadmon
2019-03-21  0:16   ` [PATCH v2 1/1] trace2: write to directory targets Josh Steadmon
2019-03-21  2:04     ` Junio C Hamano
2019-03-21 17:43       ` Jeff Hostetler
2019-03-22  3:30         ` Junio C Hamano
2019-03-22 14:20           ` Jeff Hostetler
2019-03-21 21:09 ` [PATCH v3 0/1] Write trace2 output to directories Josh Steadmon
2019-03-21 21:09   ` [PATCH v3 1/1] trace2: write to directory targets Josh Steadmon
2019-03-23 20:44     ` Ævar Arnfjörð Bjarmason
2019-03-24 12:33       ` Junio C Hamano
2019-03-24 14:51         ` Ævar Arnfjörð Bjarmason
2019-03-25  2:21           ` Junio C Hamano
2019-03-25  8:21             ` Ævar Arnfjörð Bjarmason
2019-03-25 16:29       ` Jeff Hostetler
2019-03-21 21:16   ` [PATCH v3 0/1] Write trace2 output to directories Jeff Hostetler
2019-03-22  5:23     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).