From: Jonathan Nieder <jrnieder@gmail.com>
To: git@jeffhostetler.com
Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v1 01/25] structured-logging: design document
Date: Mon, 20 Aug 2018 21:47:24 -0700 [thread overview]
Message-ID: <20180821044724.GA219616@aiede.svl.corp.google.com> (raw)
In-Reply-To: <20180713165621.52017-2-git@jeffhostetler.com>
Hi,
Jeff Hostetler wrote:
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
> Documentation/technical/structured-logging.txt | 816 +++++++++++++++++++++++++
> 1 file changed, 816 insertions(+)
> create mode 100644 Documentation/technical/structured-logging.txt
Can you add this to Documentation/Makefile as well, so that an HTML
version will appear once this is merged? See
https://public-inbox.org/git/20180814222846.GG142615@aiede.svl.corp.google.com/
for an example.
[...]
> +++ b/Documentation/technical/structured-logging.txt
> @@ -0,0 +1,816 @@
> +Structured Logging
> +==================
> +
> +Structured Logging (SLOG) is an optional feature to allow Git to
> +generate structured log data for executed commands. This includes
> +command line arguments, command run times, error codes and messages,
> +child process information, time spent in various critical functions,
> +and repository data-shape information. Data is written to a target
> +log file in JSON[1,2,3] format.
I like the idea of more structured logs for tracing, monitoring, and
diagnosis (see also
https://research.google.com/archive/papers/dapper-2010-1.pdf on the
subject of tracing). My main focus in looking over this initial
contribution is
1. Is this something that could eventually merge with our other
tracing APIs (e.g. trace_printf), or will they remain separate?
2. Is the API introduced here one that will be easy to work with
and adapt? What will calling code that makes use of it look like?
[...]
> +Background (Git Merge 2018 Barcelona)
> +=====================================
> +
> +Performance and/or error logging was discussed during the contributor's
> +summit in Barcelona. Here are the relevant notes from the meeting
> +minutes[6].
> +
> +> Performance misc (Ævar)
> +> -----------------------
> +> [...]
> +> - central error reporting for git
> +> - `git status` logging
> +> - git config that collects data, pushes to known endpoint with `git push`
> +> - pre_command and post_command hooks, for logs
> +> - `gvfs diagnose` that looks at packfiles, etc
[etc]
I'm not sure what to make of this section. Is it a historical
artifact, or should I consider it part of the design?
[...]
> +A Quick Example
> +===============
> +
> +Note: JSON pretty-printing is enabled in all of the examples shown in
> +this document. When pretty-printing is turned off, each event is
> +written on a single line. Pretty-printing is intended for debugging.
> +It should be turned off in production to make post-processing easier.
> +
> + $ git config slog.pretty <bool>
> +
> +Here is a quick example showing SLOG data for "git status". This
> +example has all optional features turned off. It contains 2 events.
Thanks for the example! I'd also be interested in what the tracing
code looks like, since the API is the 'deepest' change involved (the
tracing format can easily change later based on experience).
My initial reaction to the example trace is that it feels like an odd
compromise: on one hand it uses text (JSON) to be human-readable, and
on the other hand it is structured to be machine-readable. The JSON
writer isn't using a standard JSON serialization library so parsing
difficulties seem likely, and the output is noisy enough that reading
it by hand is hard, too. That leads me to wish for a different
format, like protobuf (where the binary form is very concise and
parseable, and the textual form is IMHO more readable than JSON).
All that said, this is just the first tracing output format and it's
easy to make new ones later. It seems fine for that purpose.
[...]
> +Target Log File
> +===============
> +
> +SLOG writes events to a log file. File logging works much like
> +GIT_TRACE where events are appended to a file on disk.
Does it write with O_APPEND? Does it do anything to guard against
interleaved events --- e.g. are messages buffered and then written
with a single write() call?
[...]
> +Comparison with GIT_TRACE
> +=========================
> +
> +SLOG is very similar to the existing GIT_TRACE[4] API because both
> +write event messages at various points during a command. However,
> +there are some fundamental differences that warrant it being
> +considered a separate feature rather than just another
> +GIT_TRACE_<key>:
The natural question would be in the opposite direction: can and
should trace_printf be reimplemented on top of this feature?
[...]
> +
> +
nit: trailing whitespace (you can find these with "git show --check").
[...]
> +A session id (SID) is a cheap, unique-enough string to associate all
> +of the events generated by a single process. A child git process inherits
> +the SID of their parent git process and incorporates it into their SID.
I wonder if we can structure events in a more hierarchical way to
avoid having to special-case the toplevel command in this way.
[...]
> + # The "cmd_exit" event includes the command result and elapsed
> + # time and the various configuration settings. During the run
> + # "index" category timers were placed around the do_read_index()
> + # and "preload()" calls and various "status" category timers were
> + # placed around the 3 major parts of the status computation.
> + # Lastly, an "index" category "aux" data item was added to report
> + # the size of the index.
Especially for tracing and monitoring applications, it would be
helpful if more of the information were emitted earlier instead of
waiting for exit. Especially if I am trying to trace the cause of a
hanging command, information that is only printed at exit does not
help me.
The main missing bit in this doc is a sketch of the API. Looking
forward to finding that in the header. ;-)
Sorry for the slow review, and thanks for your thoughtful work.
Sincerely,
Jonathan
next prev parent reply other threads:[~2018-08-21 4:47 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
2018-07-14 8:34 ` Simon Ruderich
2018-08-03 15:26 ` Ben Peart
2018-08-09 14:30 ` Jeff Hostetler
2018-08-21 4:47 ` Jonathan Nieder [this message]
2018-07-13 16:55 ` [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile git
2018-08-21 4:49 ` Jonathan Nieder
2018-07-13 16:55 ` [PATCH v1 03/25] structured-logging: add structured logging framework git
2018-07-26 9:09 ` SZEDER Gábor
2018-07-27 12:45 ` Jeff Hostetler
2018-08-21 5:05 ` Jonathan Nieder
2018-07-13 16:56 ` [PATCH v1 04/25] structured-logging: add session-id to log events git
2018-07-13 16:56 ` [PATCH v1 05/25] structured-logging: set sub_command field for branch command git
2018-07-13 16:56 ` [PATCH v1 06/25] structured-logging: set sub_command field for checkout command git
2018-07-13 16:56 ` [PATCH v1 07/25] structured-logging: t0420 basic tests git
2018-07-13 16:56 ` [PATCH v1 08/25] structured-logging: add detail-event facility git
2018-07-13 16:56 ` [PATCH v1 09/25] structured-logging: add detail-event for lazy_init_name_hash git
2018-07-13 16:56 ` [PATCH v1 10/25] structured-logging: add timer facility git
2018-07-13 16:56 ` [PATCH v1 11/25] structured-logging: add timer around do_read_index git
2018-07-13 16:56 ` [PATCH v1 12/25] structured-logging: add timer around do_write_index git
2018-07-13 16:56 ` [PATCH v1 13/25] structured-logging: add timer around wt-status functions git
2018-07-13 16:56 ` [PATCH v1 14/25] structured-logging: add timer around preload_index git
2018-07-13 16:56 ` [PATCH v1 15/25] structured-logging: t0420 tests for timers git
2018-07-13 16:56 ` [PATCH v1 16/25] structured-logging: add aux-data facility git
2018-07-13 16:56 ` [PATCH v1 17/25] structured-logging: add aux-data for index size git
2018-07-13 16:56 ` [PATCH v1 18/25] structured-logging: add aux-data for size of sparse-checkout file git
2018-07-13 16:56 ` [PATCH v1 19/25] structured-logging: t0420 tests for aux-data git
2018-07-13 16:56 ` [PATCH v1 20/25] structured-logging: add structured logging to remote-curl git
2018-07-13 16:56 ` [PATCH v1 21/25] structured-logging: add detail-events for child processes git
2018-07-13 16:56 ` [PATCH v1 22/25] structured-logging: add child process classification git
2018-07-13 16:56 ` [PATCH v1 23/25] structured-logging: t0420 tests for child process detail events git
2018-07-13 16:56 ` [PATCH v1 24/25] structured-logging: t0420 tests for interacitve child_summary git
2018-07-13 16:56 ` [PATCH v1 25/25] structured-logging: add config data facility git
2018-07-13 18:51 ` [PATCH v1 00/25] RFC: structured logging David Lang
2018-07-16 13:29 ` Jeff Hostetler
2018-08-28 17:38 ` Junio C Hamano
2018-08-28 18:47 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180821044724.GA219616@aiede.svl.corp.google.com \
--to=jrnieder@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).