git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Josh Steadmon" <steadmon@google.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH v3 1/1] trace2: write to directory targets
Date: Mon, 25 Mar 2019 12:29:32 -0400	[thread overview]
Message-ID: <11e8e140-c2b6-8234-e6a3-affe69286cbf@jeffhostetler.com> (raw)
In-Reply-To: <87bm21coco.fsf@evledraar.gmail.com>



On 3/23/2019 4:44 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Mar 21 2019, Josh Steadmon wrote:
> 
>> When the value of a trace2 environment variable is an absolute path
>> referring to an existing directory, write output to files (one per
>> process) underneath the given directory. Files will be named according
>> to the final component of the trace2 SID, followed by a counter to avoid
>> potential collisions.
> 
[...]
> 
> The reason I'm raising this is that it seems like sweeping an existing
> issue under the rug. We document that the "sid" is "unique", and it's just:
> 
>      <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>
> 
> So that might be a lie, and in particular I can imagine that say if
> every machine at Google is logging traces into some magic mounted FS
> that there'll be collisions there.
> 
> But then let's *fix that*, because we're also e.g. going to have other
> consumers of these traces using the sid's as primary keys in a logging
> system.
> 
> I wonder if we should just make it a bit longer, human-readable, and
> include a hash of the hostname:
> 
>      perl -MTime::HiRes=gettimeofday -MSys::Hostname -MDigest::SHA=sha1_hex -MPOSIX=strftime -wE '
>          my ($t, $m) = gettimeofday;
>          my $host_hex = substr sha1_hex(hostname()), 0, 8;
>          my $htime = strftime("%Y%m%d%H%M%S", localtime);
>          my $sid = sprintf("%s-%6d-%s-%s",
>              $htime,
>              $m,
>              $host_hex,
>              $$ & 0xFFFF,
>          );
>          say $sid;
>      '
> 
> Which gets you a SID like:
> 
>      20190323213918-404788-c2f5b994-19027
> 
> I.e.:
> 
>      <YYYYMMDDHHMMSS>-<microsecond-offset>-<8 chars of sha1(hostname -f)>-<pid>
> 
> There's obviously ways to make that more compact, but in this case I
> couldn't see a reason to, also using UTC would be a good idea.
> 
> All the trace2 tests pass if I fake that up. Jeff H: Do you have
> anything that relies on the current format?
I'm using the SID hierarchy to track parent and child processes,
but the actual format of an individual SID-component is mostly a
black box.

I used the microseconds+pid as unique enough.  And events for new
commands will mostly just append to an existing index, rather than
being a random insert like you'd get for a GUID.

I didn't use a GUID here because that seemed overkill and a little
bit more expensive, but perhaps that was just premature optimization
on my part.


So, a new fixed width format like you suggested above would be fine.
I wonder though, if we're moving towards a stronger SID, there's no
reason to keep the PID in it.  Which makes me wonder about the value
of sha(hostname) too.  Perhaps, just make it a GUID or some combination
of the UTC date and a GUID ( <YYMMDDHHMMSS>-<microseconds>-<GUID> ) or
something like that.

If it helps, we can change how I'm reporting the SID between parent
and child processes, so that the SID field in the JSON events is
just the SID of the current process and have a peer field with the
SID-hierarchy.  This latter field would only need to be added to the
"version" or "start" event.  This might make post-processing a little
easier.  Not sure it matters one way or the other.

I'm open to suggestions here.

Jeff


  parent reply	other threads:[~2019-03-25 16:29 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 23:33 [PATCH 0/2] Randomize / timestamp trace2 targets Josh Steadmon
2019-03-13 23:33 ` [PATCH 1/2] date: make get_time() public Josh Steadmon
2019-03-13 23:33 ` [PATCH 2/2] trace2: randomize/timestamp trace2 targets Josh Steadmon
2019-03-13 23:49   ` Ævar Arnfjörð Bjarmason
2019-03-15 18:39     ` Jeff Hostetler
2019-03-15 19:26       ` Ævar Arnfjörð Bjarmason
2019-03-15 20:14         ` Jeff Hostetler
2019-03-15 20:43     ` Josh Steadmon
2019-03-15 20:49       ` Josh Steadmon
2019-03-18  1:40         ` Junio C Hamano
2019-03-19  3:17           ` Jeff King
2019-03-14  0:16   ` Jeff King
2019-03-14  6:07     ` Junio C Hamano
2019-03-14 14:34 ` [PATCH 0/2] Randomize / timestamp " Johannes Schindelin
2019-03-15 20:37   ` Josh Steadmon
2019-03-15 19:18 ` Jeff Hostetler
2019-03-15 20:38   ` Josh Steadmon
2019-03-18 12:50     ` Jeff Hostetler
2019-03-21  0:16 ` [PATCH v2 0/1] Write trace2 output to directories Josh Steadmon
2019-03-21  0:16   ` [PATCH v2 1/1] trace2: write to directory targets Josh Steadmon
2019-03-21  2:04     ` Junio C Hamano
2019-03-21 17:43       ` Jeff Hostetler
2019-03-22  3:30         ` Junio C Hamano
2019-03-22 14:20           ` Jeff Hostetler
2019-03-21 21:09 ` [PATCH v3 0/1] Write trace2 output to directories Josh Steadmon
2019-03-21 21:09   ` [PATCH v3 1/1] trace2: write to directory targets Josh Steadmon
2019-03-23 20:44     ` Ævar Arnfjörð Bjarmason
2019-03-24 12:33       ` Junio C Hamano
2019-03-24 14:51         ` Ævar Arnfjörð Bjarmason
2019-03-25  2:21           ` Junio C Hamano
2019-03-25  8:21             ` Ævar Arnfjörð Bjarmason
2019-03-25 16:29       ` Jeff Hostetler [this message]
2019-03-21 21:16   ` [PATCH v3 0/1] Write trace2 output to directories Jeff Hostetler
2019-03-22  5:23     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11e8e140-c2b6-8234-e6a3-affe69286cbf@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).