git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 00/25] RFC: structured logging
@ 2018-07-13 16:55 git
  2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
                   ` (26 more replies)
  0 siblings, 27 replies; 38+ messages in thread
From: git @ 2018-07-13 16:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

This RFC patch series adds structured logging to git.  The motivation,
background, and limitations of this feature are described at the
beginning of the design document in the first commit.  The design
document also contains a section comparing this feature with the
existing GIT_TRACE feature.  So I won't go into great detail here in
the cover letter.

My primary focus in this RFC is to reach agreement on the structured
logging facility.  This includes the basic approach and the various
logging fields and timers.

This patch series also includes several example usage commits, such as
adding timers around do_{read,write}_index, that demonstrate the
capabilities of the structured logging facility.  I only added a few
examples for things that I think we'll want long-term.  I did not
attempt to instrument everything.

This patch series requires V11 of my json-writer patch series.


Jeff Hostetler (25):
  structured-logging: design document
  structured-logging: add STRUCTURED_LOGGING=1 to Makefile
  structured-logging: add structured logging framework
  structured-logging: add session-id to log events
  structured-logging: set sub_command field for branch command
  structured-logging: set sub_command field for checkout command
  structured-logging: t0420 basic tests
  structured-logging: add detail-event facility
  structured-logging: add detail-event for lazy_init_name_hash
  structured-logging: add timer facility
  structured-logging: add timer around do_read_index
  structured-logging: add timer around do_write_index
  structured-logging: add timer around wt-status functions
  structured-logging: add timer around preload_index
  structured-logging: t0420 tests for timers
  structured-logging: add aux-data facility
  structured-logging: add aux-data for index size
  structured-logging: add aux-data for size of sparse-checkout file
  structured-logging: t0420 tests for aux-data
  structured-logging: add structured logging to remote-curl
  structured-logging: add detail-events for child processes
  structured-logging: add child process classification
  structured-logging: t0420 tests for child process detail events
  structured-logging: t0420 tests for interacitve child_summary
  structured-logging: add config data facility

 Documentation/config.txt                       |   33 +
 Documentation/git.txt                          |    6 +
 Documentation/technical/structured-logging.txt |  816 ++++++++++++++++
 Makefile                                       |    8 +
 builtin/branch.c                               |    8 +
 builtin/checkout.c                             |    7 +
 compat/mingw.h                                 |    7 +
 config.c                                       |    3 +
 editor.c                                       |    1 +
 git-compat-util.h                              |    9 +
 git.c                                          |   10 +-
 name-hash.c                                    |   26 +
 pager.c                                        |    1 +
 preload-index.c                                |    6 +
 read-cache.c                                   |   14 +
 remote-curl.c                                  |   16 +-
 run-command.c                                  |   14 +-
 run-command.h                                  |    2 +
 structured-logging.c                           | 1219 ++++++++++++++++++++++++
 structured-logging.h                           |  179 ++++
 sub-process.c                                  |    1 +
 t/t0001-init.sh                                |    1 +
 t/t0420-structured-logging.sh                  |  293 ++++++
 t/t0420/parse_json.perl                        |   52 +
 t/test-lib.sh                                  |    1 +
 unpack-trees.c                                 |    4 +-
 usage.c                                        |    4 +
 wt-status.c                                    |   20 +
 28 files changed, 2757 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/structured-logging.txt
 create mode 100644 structured-logging.c
 create mode 100644 structured-logging.h
 create mode 100755 t/t0420-structured-logging.sh
 create mode 100644 t/t0420/parse_json.perl

-- 
2.9.3


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v1 01/25] structured-logging: design document
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
@ 2018-07-13 16:55 ` git
  2018-07-14  8:34   ` Simon Ruderich
                     ` (2 more replies)
  2018-07-13 16:55 ` [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile git
                   ` (25 subsequent siblings)
  26 siblings, 3 replies; 38+ messages in thread
From: git @ 2018-07-13 16:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/structured-logging.txt | 816 +++++++++++++++++++++++++
 1 file changed, 816 insertions(+)
 create mode 100644 Documentation/technical/structured-logging.txt

diff --git a/Documentation/technical/structured-logging.txt b/Documentation/technical/structured-logging.txt
new file mode 100644
index 0000000..794c614
--- /dev/null
+++ b/Documentation/technical/structured-logging.txt
@@ -0,0 +1,816 @@
+Structured Logging
+==================
+
+Structured Logging (SLOG) is an optional feature to allow Git to
+generate structured log data for executed commands.  This includes
+command line arguments, command run times, error codes and messages,
+child process information, time spent in various critical functions,
+and repository data-shape information.  Data is written to a target
+log file in JSON[1,2,3] format.
+
+SLOG is disabled by default.  Several steps are required to enable it:
+
+1. Add the compile-time flag "STRUCTURED_LOGGING=1" when building git
+   to include the SLOG routines in the git executable.
+
+2. Set "slog.*" config settings[5] to enable SLOG in your repo.
+
+
+Motivation
+==========
+
+Git users may be faced with scenarios that are surprisingly slow or
+produce unexpected results.  And Git developers may have difficulty
+reproducing these experiences.  Structured logging allows users to
+provide developers with additional usage, performance and error data
+that can help diagnose and debug issues.
+
+Many Git hosting providers and users with many developers have bespoke
+efforts to help troubleshoot problems; for example, command wrappers,
+custom pre- and post-command hooks, and custom instrumentation of Git
+code.  These are inefficient and/or difficult to maintain.  The goal
+of SLOG is to provide this data as efficiently as possible.
+
+And having structured rather than free format log data, will help
+developers with their analysis.
+
+
+Background (Git Merge 2018 Barcelona)
+=====================================
+
+Performance and/or error logging was discussed during the contributor's
+summit in Barcelona.  Here are the relevant notes from the meeting
+minutes[6].
+
+> Performance misc (Ævar)
+> -----------------------
+> [...]
+>  - central error reporting for git
+>    - `git status` logging
+>    - git config that collects data, pushes to known endpoint with `git push`
+>    - pre_command and post_command hooks, for logs
+>    - `gvfs diagnose` that looks at packfiles, etc
+>    - detect BSODs, etc
+>    - Dropbox writes out json with index properties and command-line
+>        information for status/fetch/push, fork/execs external tool to upload
+>    - windows trace facility; would be nice to have cross-platform
+>    - would hosting providers care?
+>    - zipfile of logs to give when debugging
+>    - sanitizing data is harder
+>    - more in a company setting
+>    - fileshare to upload zipfile
+>    - most of the errors are proxy when they shouldn't, wrong proxy, proxy
+>        specific to particular URL; so upload endpoint wouldn't work
+>    - GIT_TRACE is supposed to be that (for proxy)
+>    - but we need more trace variables
+>    - series to make tracing cheaper
+>    - except that curl selects the proxy
+>    - trace should have an API, so it can call an executable
+>    - dump to .git/traces/... and everything else happens externally
+>    - tools like visual studio can't set GIT_TRACE, so
+>    - sourcetree has seen user environments where commands just take forever
+>    - third-party tools like perf/strace - could we be better leveraging those?
+>    - distribute turn-key solution to handout to collect more data?
+
+
+A Quick Example
+===============
+
+Note: JSON pretty-printing is enabled in all of the examples shown in
+this document.  When pretty-printing is turned off, each event is
+written on a single line.  Pretty-printing is intended for debugging.
+It should be turned off in production to make post-processing easier.
+
+    $ git config slog.pretty <bool>
+
+Here is a quick example showing SLOG data for "git status".  This
+example has all optional features turned off.  It contains 2 events.
+The first is generated when the command started and the second when it
+ended.
+
+{
+  "event": "cmd_start",
+  "clock_us": 1530273550667800,
+  "pid": 107270,
+  "sid": "1530273550667800-107270",
+  "command": "status",
+  "argv": [
+    "./git",
+    "status"
+  ]
+}
+{
+  "event": "cmd_exit",
+  "clock_us": 1530273550680460,
+  "pid": 107270,
+  "sid": "1530273550667800-107270",
+  "command": "status",
+  "argv": [
+    "./git",
+    "status"
+  ],
+  "result": {
+    "exit_code": 0,
+    "elapsed_core_us": 12573,
+    "elapsed_total_us": 12660
+  },
+  "version": {
+    "git": "2.18.0.rc0.83.gde7fb7c",
+    "slog": 0
+  },
+  "config": {
+    "slog": {
+      "detail": 0,
+      "timers": 0,
+      "aux": 0
+    }
+  }
+}
+
+Both events fields describing the event name, the event time (in
+microseconds since the epoch), the OS process-id, a unique session-id
+(described later), the normalized command name, and a vector of the
+original command line arguments.
+
+The "cmd_exit" event additionally contains information about how the
+process exited, the elapsed time, the version of git and the SLOG
+format, and important SLOG configuration settings.
+
+The fields in the "cmd_start" event are replicated in the "cmd_exit"
+event.  This allows log file post-processors to operate in 2 modes:
+
+1. Just look at the "cmd_exit" events.  This us useful if you just
+   want to analyze the command summary data.
+
+2. Look at the "cmd_start" and "cmd_exit" events as bracketing a time
+   span and examine the detailed activity between.  For example, SLOG
+   can optionally generate "detail" events when spawning child
+   processes and those processes may themselves generate "cmd_start"
+   and "cmd_exit" events.  The (top-level) "cmd_start" event serves as
+   the starting bracket of all of that activity.
+
+
+Target Log File
+===============
+
+SLOG writes events to a log file.  File logging works much like
+GIT_TRACE where events are appended to a file on disk.
+
+Logging is enabled if the config variable "slog.path" is set to an
+absolute pathname.
+
+As with GIT_TRACE, this file is local and private to the user's
+system.  Log file management and rotation is beyond the scope of the
+SLOG effort.
+
+Similarly, if a user wants to provide this data to a developer, they
+must explicitly make these log files available; SLOG does not
+broadcast any of this information.  It is up to the users of this
+system to decide if any sensitive information should be sanitized, and
+how to export the logs.
+
+
+Comparison with GIT_TRACE
+=========================
+
+SLOG is very similar to the existing GIT_TRACE[4] API because both
+write event messages at various points during a command.  However,
+there are some fundamental differences that warrant it being
+considered a separate feature rather than just another
+GIT_TRACE_<key>:
+
+1. GIT_TRACE events are unrelated, line-by-line logging.  SLOG has
+   line-by-line events that show command progress and can serve as
+   structured debug messages.  SLOG also supports accumulating summary
+   data (such as timers) that are automatically added to the final
+   `cmd_exit` event.
+
+2. GIT_TRACE events are unstructured free format printf-style messages
+   which makes post-processing difficult.  SLOG events are written in
+   JSON and can be easily parsed using Perl, Python, and other tools.
+
+3. SLOG uses a well-defined API to build SLOG events containing
+   well-defined fields to make post-command analysis easier.
+
+4. It should be easier to filter/redact sensitive information from
+   SLOG data than from free form data.
+
+5. GIT_TRACE events are controlled by one or more global environment
+   variables which makes it awkward to selectively log some repos and
+   not others.  SLOG events are controlled by a few configuration
+   settings[5].  Users (or system administrators) can configure
+   logging using repo-local or global config settings.
+
+6. GIT_TRACE events do not identify the git process.  This makes it
+   difficult to associate all of events from a particular command.
+   Each SLOG event contains a session id to allow all events for a
+   command to be identified.
+
+7. Some git commands spawn child git commands.  GIT_TRACE has no
+   mechanism to associate events from a child process with the parent
+   process.  SLOG session ids allow child/parent relationships to be
+   tracked (even if there is an intermediate /bin/sh process between
+   them).
+
+8. GIT_TRACE supports logging to a file or stderr.  SLOG only logs to
+   a file.
+
+9. Smashing SLOG into GIT_TRACE doesn't feel like a good fit.  The 2
+   APIs share nothing other than the concept that they write logging
+   data.
+
+
+[1] http://json.org/
+[2] http://www.ietf.org/rfc/rfc7159.txt
+[3] See UTF-8 limitations described in json-writer.h
+[4] Documentation/technical/api-trace.txt
+[5] See "slog.*" in Documentation/config.txt
+[6] https://public-inbox.org/git/20180313004940.GG61720@google.com/t/
+
+
+SLOG Format (V0)
+================
+
+SLOG writes a series of events to the log target.  Each event is a
+self-describing JSON object.
+
+    <event> LF
+    <event> LF
+    <event> LF
+    ...
+
+Each event record in the log file is an independent and complete JSON
+object.  JSON parsers should process the file line-by-line rather than
+trying to parse the entire file into a single object.
+
+    Note: It may be difficult for parsers to find record boundaries if
+    pretty-printing is enabled, so it recommended that pretty-printing
+    only be enabled for interactive debugging and analysis.
+
+Every <event> contains the following fields (F1):
+
+    "event"       : <event_name>
+    "clock_us"    : <event_time>
+    "pid"         : <os_pid>
+    "sid"         : <session_id>
+
+    "command"     : <command_name>
+    "sub_command" : <sub_command_name> (optional)
+
+<event_name> is one of "cmd_start", "cmd_end", or "detail".
+
+<event_time> is the time of the event in microseconds since the epoch.
+
+<os_pid> is the process-id (from getpid()).
+
+<session_id> is a session-id.  (Described later)
+
+<command_name> is a (possibly normalized) command name.  This is
+    usually taken from the cmd_struct[] table after git parses the
+    command line and calls the appropriate cmd_<name>() function.
+    Having it in a top-level field saves post-processors from having
+    to re-parse the command line to discover it.
+
+<sub_command_name> further qualifies the command.  This field is
+    present for common commands that have multiple command modes.  For
+    example, checkout can either change branches and do a full
+    checkout or it can checkout (refresh) an individual file.  A
+    post-processor wanting to compute percentiles for the time spent
+    by branch-changing checkouts could easily filter out the
+    individual file checkouts (and without having to re-parse the
+    command line).
+
+    The set of sub_command values are command-specific and are not
+    listed here.
+
+"event": "cmd_start"
+-------------------
+
+The "cmd_start" event is emitted when git starts when cmd_main() is
+called.  In addition to the F1 fields, it contains the following
+fields (F2):
+
+    "argv"        : <array-of-command-line-arguments>
+
+<argv> is an array of the original command line arguments given to the
+    command (before git.c has a chance to remove the global options
+    before the verb.
+
+
+"event": "cmd_exit"
+-------------------
+
+The "cmd_exit" event is emitted immediately before git exits (during
+an atexit() routine).  It contains the F1 and F2 fields as described
+above.  It also contains the the following fields (F3):
+
+    "result.exit_code"        : <exit_code>
+    "result.errors"           : <arrary_of_error_messages> (optional)
+    "result.elapsed_core_us"  : <elapsed_time_to_exit>
+    "result.elapsed_total_us" : <elapsed_time_to_atexit>
+    "result.signal"           : <signal_value> (optional)
+
+    "verion.git"              : <git_version>
+    "version.slog"            : <slog_version>
+
+    "config.slog.detail"      : <slog_detail>
+    "config.slog.timers"      : <slog_timers>
+    "config.slog.aux"         : <slog_aux>
+    "config.*.*"              : <other_config_settings> (optional)
+
+    "timers"                  : <timers> (optional)
+    "aux"                     : <aux> (optional)
+
+    "child_summary"           : <child_summary> (optional)
+
+<exit_code> is the value passed to exit() or returned from main().
+
+<array_of_error_messages> is an array of messages passed to the die()
+    and error() functions.
+
+<elapsed_time_to_exit> is the elapsed time from start until exit()
+    was called or main() returned.
+
+<elapsed_time_to_atexit> is the elapsed time from start until the slog
+    atexit routine was called.  This time will include any time
+    required to shut down or wait for the pager to complete.
+
+<signal_value> is present if the command was stopped by a single,
+    such as a SIGPIPE when the pager is quit.
+
+<git_version> is the git version number as reported by "git version".
+
+<slog_version> is the SLOG format version.
+
+<slog_{detail,timers,aux}> are the values of the corresponding
+    "slog.{detail,timers,aux}" config setting.  Since these values
+    control optional SLOG features and filtering, these are present
+    to help post-processors know if an expected event did not happen
+    or was simply filtered out.  (Described later)
+
+<other_config_settings> is a place for developers to add additional
+    important config settings to the log.  This is not intended as a
+    dumping ground for all config settings, but rather only ones that
+    might affect performance or allow A/B testing in production.
+
+<timers> is a structure of any SLOG timers used during the process.
+    (Described later)
+
+<aux> is a structure of any "aux data" generated during the process.
+    (Described later)
+
+<child_summary> is a structure summarizing child processes by class.
+    (Described later)
+
+
+"event": "detail" and config setting "slog.detail"
+--------------------------------------------------
+
+The "detail" event is used to report progress and/or debug information
+during a command.  It is a line-by-line (rather than summary) event.
+Like GIT_TRACE_<key>, detail events are classified by "category" and
+may be included or omitted based on the "slog.detail" config setting.
+
+Here are 3 example "detail" events:
+
+{
+  "event": "detail",
+  "clock_us": 1530273485479387,
+  "pid": 107253,
+  "sid": "1530273485473820-107253",
+  "command": "status",
+  "detail": {
+    "category": "index",
+    "label": "lazy_init_name_hash",
+    "data": {
+      "cache_nr": 3269,
+      "elapsed_us": 195,
+      "dir_count": 0,
+      "dir_tablesize": 4096,
+      "name_count": 3269,
+      "name_tablesize": 4096
+    }
+  }
+}
+{
+  "event": "detail",
+  "clock_us": 1530283184051338,
+  "pid": 109679,
+  "sid": "1530283180782876-109679",
+  "command": "fetch",
+  "detail": {
+    "category": "child",
+    "label": "child_starting",
+    "data": {
+      "child_id": 3,
+      "git_cmd": true,
+      "use_shell": false,
+      "is_interactive": false,
+      "child_argv": [
+        "gc",
+        "--auto"
+      ]
+    }
+  }
+}
+{
+  "event": "detail",
+  "clock_us": 1530283184053158,
+  "pid": 109679,
+  "sid": "1530283180782876-109679",
+  "command": "fetch",
+  "detail": {
+    "category": "child",
+    "label": "child_ended",
+    "data": {
+      "child_id": 3,
+      "git_cmd": true,
+      "use_shell": false,
+      "is_interactive": false,
+      "child_argv": [
+        "gc",
+        "--auto"
+      ],
+      "child_pid": 109684,
+      "child_exit_code": 0,
+      "child_elapsed_us": 1819
+    }
+  }
+}
+
+A detail event contains the common F1 described earlier.  It also
+contains 2 fixed fields and 1 variable field:
+
+    "detail.category" : <detail_category>
+    "detail.label"    : <detail_label>
+    "detail.data"     : <detail_data>
+
+<detail_category> is the "category" name for the event.  This is
+    similar to GIT_TRACE_<key>.  In the example above we have 1
+    "index" and 2 "child" category events.
+
+    If the config setting "slog.detail" is true or contains this
+    category name, the event will be generated.  If "slog.detail"
+    is false, no detail events will be generated.
+
+    $ git config slog.detail true
+    $ git config slog.detail child,index,status
+    $ git config slog.detail false
+
+<detail_label> is a descriptive label for the event.  It may be the
+    name of a function or any meaningful value.
+
+<detail_data> is a JSON structure containing context-specific data for
+    the event.  This replaces the need for printf-like trace messages.
+
+
+Child Detail Events
+-------------------
+
+Child detail events build upon the generic detail event and are used
+to log information about spawned child processes.
+
+A "child_starting" detail event is generated immediately before
+spawning a child process.
+
+    "event"                      : "detail:
+    "detail.category"            : "child"
+    "detail.label"               : "child_starting"
+
+    "detail.data.child_id"       : <child_id>
+    "detail.data.git_cmd"        : <is_git_cmd>
+    "detail.data.use_shell"      : <use_shell>
+    "detail.data.is_interactive" : <is_interactive>
+    "detail.data.child_class"    : <child_class>
+    "detail.data.child_argv"     : <child_argv>
+
+<child_id> is a simple integer id number for the child.  This helps
+    match up the "child_starting" and "child_ended" detail events.
+    (The child's PID is not available until it is started.)
+
+<is_git_cmd> is true if git will try to run the command as a git
+    command.  The reported argv[0] for the child is probably a git
+    command verb rather than "git".
+
+<use_shell> is true if gill will try to use the shell to run the
+    command.
+
+<is_interactive> is true if the child is considered interactive.
+    Editor and pager processes are considered interactive.
+
+<child_class> is a classification for the child process, such as
+    "editor", "pager", and "shell".
+
+<child_argv> is the array of arguments to be passed to the child.
+
+
+A "child_ended" detail event is generated after the child process
+terminates and has been reaped.
+
+    "event"                        : "detail:
+    "detail.category"              : "child"
+    "detail.label"                 : "child_ended"
+
+    "detail.data.child_id"         : <child_id>
+    "detail.data.git_cmd"          : <is_git_cmd>
+    "detail.data.use_shell"        : <use_shell>
+    "detail.data.is_interactive"   : <is_interactive>
+    "detail.data.child_class"      : <child_class>
+    "detail.data.child_argv"       : <child_argv>
+
+    "detail.data.child_pid"        : <child_pid>
+    "detail.data.child_exit_code"  : <child_exit_code>
+    "detail.data.child_elapsed_us" : <child_elapsed_time>
+
+<child_pid> is the OS process-id for the child process.
+
+<child_exit_code> is the child's exit code.
+
+<child_elapsed_time> is the elapsed time in microseconds since the
+    "child_starting" event was generated.  This is the observed time
+    the current process waited for the child to complete.  This value
+    will be slightly larger than the value that the child process
+    reports for itself.
+
+
+Child Summary Data <child_summary>
+==================================
+
+If child processes are spawned, a summary is written to the "cmd_exit"
+event.  (This is written even if child detail events are disabled.)
+The summary is grouped by child class (editor, pager, etc.) and contains
+the number of child processes and their total elapsed time.
+
+For example:
+
+{
+  "event": "cmd_exit",
+  ...,
+  "child_summary": {
+    "pager": {
+      "total_us": 14994045,
+      "count": 1
+    }
+  }
+}
+
+    "child_summary.<child_class>.total_us" : <total_us>
+    "child_summary.<child_class>.count"    : <count>
+
+Note that the total child time may exceed the elapsed time for the
+git process because child processes may have been run in parallel.
+
+
+Timers <timers> and config setting "slog.timers"
+================================================
+
+SLOG provides a stopwatch-like timer facility to easily instrument
+small spans of code.  These timers are automatically added to the
+"cmd_exit" event.  These are lighter weight than using explicit
+"detail" events or git_trace_performance_since()-style messages.
+Also, having timer data included in the "cmd_exit" event makes it
+easier for some kinds of post-processing.
+
+For example:
+
+{
+  "event": "cmd_exit",
+  ...,
+  "timers": {
+    "index": {
+      "do_read_index": {
+        "count": 1,
+        "total_us": 488
+      },
+      "preload": {
+        "count": 1,
+        "total_us": 2394
+      }
+    },
+    "status": {
+      "changes_index": {
+        "count": 1,
+        "total_us": 574
+      },
+      "untracked": {
+        "count": 1,
+        "total_us": 5877
+      },
+      "worktree": {
+        "count": 1,
+        "total_us": 92
+      }
+    }
+  }
+}
+
+Timers have a "category" and a "name".  Timers may be enabled or
+disabled by category (much like GIT_TRACE_<key>).  The "slog.timers"
+config setting controls which timers are enabled.  For example:
+
+    $ git config --local slog.timers true
+    $ git config --local slog.timers index,status
+    $ git config --local slog.timers false
+
+Data for the enabled timers is written in the "cmd_exit" event under
+the "timers" structure.  They are grouped by category.  Each timer
+contains the total elapsed time and the number of times the timer was
+started.  Min, max, and average times are included if the timer was
+started/stopped more than once.  And "force_stop" flag is set if the
+timer was still running when the command finished.
+
+    "timers.<category>.<timer_name>.count"      : <start_count>
+    "timers.<category>.<timer_name>.total_us"   : <total_us>
+    "timers.<category>.<timer_name>.min_us"     : <min_us> (optional)
+    "timers.<category>.<timer_name>.max_us"     : <min_us> (optional)
+    "timers.<category>.<timer_name>.avg_us"     : <avg_us> (optional)
+    "timers.<category>.<timer_name>.force_stop" : <bool> (optional)
+    
+    
+Aux Data <aux> and config setting "slog.aux"
+============================================
+
+"Aux" data is intended as a generic container for context-specific
+fields, such as information about the size or shape of the repository.
+This data is automatically added to the "cmd_exit" event.  This is
+data is lighter weight than using explicit detail events and may make
+some kinds of post-processing easier.
+
+For example:
+
+{
+  "event": "cmd_exit",
+  ...,
+  "aux": {
+    "index": [
+      [
+        "cache_nr",
+        3269
+      ],
+      [
+        "sparse_checkout_count",
+        1
+      ]
+    ]
+  }
+
+
+This API adds additional key/value pairs to the "cmd_exit" summary
+data.  Value may be scalars or any JSON structure or array.
+
+Like detail events and timers, each key/value pair is associated with
+a "category" (much like GIT_TRACE_<key>).  The "slog.aux" config
+setting controls which pairs are written or omitted.  For example:
+
+    $ git config --local slog.aux true
+    $ git config --local slog.aux index
+    $ git config --local slog.aux false
+
+Aux data is written in the "cmd_exit" event under the "aux" structure
+and are grouped by category.  Each key/value pair is written as an
+array rather than a structure to allow for duplicate keys.
+
+    "aux.<category>" : [ <kv_pair>, <kv_pair>, ... ]
+
+<kv_pair> is 2-element array of [ <key>, <value> ].
+
+
+Session-Id <session_id>
+=======================
+
+A session id (SID) is a cheap, unique-enough string to associate all
+of the events generated by a single process.  A child git process inherits
+the SID of their parent git process and incorporates it into their SID.
+
+SIDs are constructed as:
+
+    SID ::= <start_time> '-' <pid> [ ':' <parent_sid> ]
+
+This scheme is used rather than a simple PID or {PPID, PID} because
+PIDs are recycled by the OS (after sufficient time).  This also allows
+a git child process to be associated with their git parent process
+even when there is an intermediate shell process.
+
+Note: we could use UUIDs or GUIDs for this, but that seemed overkill
+at this point.  It also required platform-specific code to generate
+which muddied up the code.
+
+
+Detailed Example
+================
+
+Here is a longer example for `git status` with all optional settings
+turned on:
+
+    $ git config slog.detail true
+    $ git config slog.timers true
+    $ git config slog.aux true
+
+    $ git status
+
+    # A "cmd_start" event is written when the command starts.
+
+{
+  "event": "cmd_start",
+  "clock_us": 1531499671154813,
+  "pid": 14667,
+  "sid": "1531499671154813-14667",
+  "command": "status",
+  "argv": [
+    "./git",
+    "status"
+  ]
+}
+
+    # An example detail event was added to lazy_init_name_hash() to
+    # dump the size of the index and the resulting hash tables.
+
+{
+  "event": "detail",
+  "clock_us": 1531499671161042,
+  "pid": 14667,
+  "sid": "1531499671154813-14667",
+  "command": "status",
+  "detail": {
+    "category": "index",
+    "label": "lazy_init_name_hash",
+    "data": {
+      "cache_nr": 3266,
+      "elapsed_us": 214,
+      "dir_count": 0,
+      "dir_tablesize": 4096,
+      "name_count": 3266,
+      "name_tablesize": 4096
+    }
+  }
+}
+
+    # The "cmd_exit" event includes the command result and elapsed
+    # time and the various configuration settings.  During the run
+    # "index" category timers were placed around the do_read_index()
+    # and "preload()" calls and various "status" category timers were
+    # placed around the 3 major parts of the status computation.
+    # Lastly, an "index" category "aux" data item was added to report
+    # the size of the index.
+
+{
+  "event": "cmd_exit",
+  "clock_us": 1531499671168303,
+  "pid": 14667,
+  "sid": "1531499671154813-14667",
+  "command": "status",
+  "argv": [
+    "./git",
+    "status"
+  ],
+  "result": {
+    "exit_code": 0,
+    "elapsed_core_us": 13488,
+    "elapsed_total_us": 13490
+  },
+  "version": {
+    "git": "2.18.0.26.gebaccfc",
+    "slog": 0
+  },
+  "config": {
+    "slog": {
+      "detail": 1,
+      "timers": 1,
+      "aux": 1
+    }
+  },
+  "timers": {
+    "index": {
+      "do_read_index": {
+        "count": 1,
+        "total_us": 553
+      },
+      "preload": {
+        "count": 1,
+        "total_us": 2892
+      }
+    },
+    "status": {
+      "changes_index": {
+        "count": 1,
+        "total_us": 778
+      },
+      "untracked": {
+        "count": 1,
+        "total_us": 6136
+      },
+      "worktree": {
+        "count": 1,
+        "total_us": 106
+      }
+    }
+  },
+  "aux": {
+    "index": [
+      [
+        "cache_nr",
+        3266
+      ]
+    ]
+  }
+}
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
  2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
@ 2018-07-13 16:55 ` git
  2018-08-21  4:49   ` Jonathan Nieder
  2018-07-13 16:55 ` [PATCH v1 03/25] structured-logging: add structured logging framework git
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 38+ messages in thread
From: git @ 2018-07-13 16:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach the Makefile to take STRUCTURED_LOGGING=1 variable to
compile in/out structured logging feature.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile             |  8 ++++++++
 structured-logging.c |  9 +++++++++
 structured-logging.h | 13 +++++++++++++
 3 files changed, 30 insertions(+)
 create mode 100644 structured-logging.c
 create mode 100644 structured-logging.h

diff --git a/Makefile b/Makefile
index 39ca66b..ccc39bf 100644
--- a/Makefile
+++ b/Makefile
@@ -442,6 +442,8 @@ all::
 # When cross-compiling, define HOST_CPU as the canonical name of the CPU on
 # which the built Git will run (for instance "x86_64").
 #
+# Define STRUCTURED_LOGGING if you want structured logging to be available.
+#
 # Define RUNTIME_PREFIX to configure Git to resolve its ancillary tooling and
 # support files relative to the location of the runtime binary, rather than
 # hard-coding them into the binary. Git installations built with RUNTIME_PREFIX
@@ -955,6 +957,7 @@ LIB_OBJS += split-index.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
+LIB_OBJS += structured-logging.o
 LIB_OBJS += submodule.o
 LIB_OBJS += submodule-config.o
 LIB_OBJS += sub-process.o
@@ -1326,6 +1329,10 @@ ifdef ZLIB_PATH
 endif
 EXTLIBS += -lz
 
+ifdef STRUCTURED_LOGGING
+	BASIC_CFLAGS += -DSTRUCTURED_LOGGING
+endif
+
 ifndef NO_OPENSSL
 	OPENSSL_LIBSSL = -lssl
 	ifdef OPENSSLDIR
@@ -2543,6 +2550,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+
 	@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
+	@echo STRUCTURED_LOGGING=\''$(subst ','\'',$(subst ','\'',$(STRUCTURED_LOGGING)))'\' >>$@+
 	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE1)))'\' >>$@+
 	@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
 	@echo NO_LIBPCRE1_JIT=\''$(subst ','\'',$(subst ','\'',$(NO_LIBPCRE1_JIT)))'\' >>$@+
diff --git a/structured-logging.c b/structured-logging.c
new file mode 100644
index 0000000..702fd84
--- /dev/null
+++ b/structured-logging.c
@@ -0,0 +1,9 @@
+#if !defined(STRUCTURED_LOGGING)
+/*
+ * Structured logging is not available.
+ * Stub out all API routines.
+ */
+
+#else
+
+#endif
diff --git a/structured-logging.h b/structured-logging.h
new file mode 100644
index 0000000..c9e8c1d
--- /dev/null
+++ b/structured-logging.h
@@ -0,0 +1,13 @@
+#ifndef STRUCTURED_LOGGING_H
+#define STRUCTURED_LOGGING_H
+
+#if !defined(STRUCTURED_LOGGING)
+/*
+ * Structured logging is not available.
+ * Stub out all API routines.
+ */
+
+#else
+
+#endif /* STRUCTURED_LOGGING */
+#endif /* STRUCTURED_LOGGING_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 03/25] structured-logging: add structured logging framework
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
  2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
  2018-07-13 16:55 ` [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile git
@ 2018-07-13 16:55 ` git
  2018-07-26  9:09   ` SZEDER Gábor
  2018-08-21  5:05   ` Jonathan Nieder
  2018-07-13 16:56 ` [PATCH v1 04/25] structured-logging: add session-id to log events git
                   ` (23 subsequent siblings)
  26 siblings, 2 replies; 38+ messages in thread
From: git @ 2018-07-13 16:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to optionally generate structured logging data in JSON using
the json-writer API.  "cmd_start" and "cmd_end" events are generated.

Structured logging is only available when git is built with
STRUCTURED_LOGGING=1.

Structured logging is only enabled when the config setting "slog.path"
is set to an absolute pathname.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/config.txt |   8 ++
 compat/mingw.h           |   7 +
 config.c                 |   3 +
 git-compat-util.h        |   9 ++
 git.c                    |   8 +-
 structured-logging.c     | 366 +++++++++++++++++++++++++++++++++++++++++++++++
 structured-logging.h     |  82 +++++++++++
 usage.c                  |   4 +
 8 files changed, 486 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf..c79f2bf 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3168,6 +3168,14 @@ showbranch.default::
 	The default set of branches for linkgit:git-show-branch[1].
 	See linkgit:git-show-branch[1].
 
+slog.path::
+	(EXPERIMENTAL) Enable structured logging to a file.  This must be
+	an absolute path.  (Git must be compiled with STRUCTURED_LOGGING=1.)
+
+slog.pretty::
+	(EXPERIMENTAL) Pretty-print structured log data when true.
+	(Git must be compiled with STRUCTURED_LOGGING=1.)
+
 splitIndex.maxPercentChange::
 	When the split index feature is used, this specifies the
 	percent of entries the split index can contain compared to the
diff --git a/compat/mingw.h b/compat/mingw.h
index 571019d..d8d8cd3 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -144,8 +144,15 @@ static inline int fcntl(int fd, int cmd, ...)
 	errno = EINVAL;
 	return -1;
 }
+
 /* bash cannot reliably detect negative return codes as failure */
+#if defined(STRUCTURED_LOGGING)
+#include "structured-logging.h"
+#define exit(code) exit(strlog_exit_code((code) & 0xff))
+#else
 #define exit(code) exit((code) & 0xff)
+#endif
+
 #define sigemptyset(x) (void)0
 static inline int sigaddset(sigset_t *set, int signum)
 { return 0; }
diff --git a/config.c b/config.c
index fbbf0f8..b27b024 100644
--- a/config.c
+++ b/config.c
@@ -1476,6 +1476,9 @@ int git_default_config(const char *var, const char *value, void *dummy)
 		return 0;
 	}
 
+	if (starts_with(var, "slog."))
+		return slog_default_config(var, value);
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/git-compat-util.h b/git-compat-util.h
index 9a64998..f5352fd 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1239,4 +1239,13 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+#include "structured-logging.h"
+#if defined(STRUCTURED_LOGGING) && !defined(exit)
+/*
+ * Intercept all calls to exit() so that exit-code can be included
+ * in the "cmd_exit" message written by the at-exit routine.
+ */
+#define exit(code) exit(slog_exit_code(code))
+#endif
+
 #endif
diff --git a/git.c b/git.c
index c2f48d5..024a40d 100644
--- a/git.c
+++ b/git.c
@@ -413,6 +413,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 		setup_work_tree();
 
 	trace_argv_printf(argv, "trace: built-in: git");
+	slog_set_command_name(p->cmd);
 
 	status = p->fn(argc, argv, prefix);
 	if (status)
@@ -700,7 +701,7 @@ static int run_argv(int *argcp, const char ***argv)
 	return done_alias;
 }
 
-int cmd_main(int argc, const char **argv)
+static int real_cmd_main(int argc, const char **argv)
 {
 	const char *cmd;
 	int done_help = 0;
@@ -779,3 +780,8 @@ int cmd_main(int argc, const char **argv)
 
 	return 1;
 }
+
+int cmd_main(int argc, const char **argv)
+{
+	return slog_wrap_main(real_cmd_main, argc, argv);
+}
diff --git a/structured-logging.c b/structured-logging.c
index 702fd84..afa2224 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -1,3 +1,10 @@
+#include "cache.h"
+#include "config.h"
+#include "version.h"
+#include "json-writer.h"
+#include "sigchain.h"
+#include "argv-array.h"
+
 #if !defined(STRUCTURED_LOGGING)
 /*
  * Structured logging is not available.
@@ -6,4 +13,363 @@
 
 #else
 
+#define SLOG_VERSION 0
+
+static uint64_t my__start_time;
+static uint64_t my__exit_time;
+static int my__is_config_loaded;
+static int my__is_enabled;
+static int my__is_pretty;
+static int my__signal;
+static int my__exit_code;
+static int my__pid;
+static int my__wrote_start_event;
+static int my__log_fd = -1;
+
+static char *my__log_path;
+static char *my__command_name;
+static char *my__sub_command_name;
+
+static struct argv_array my__argv = ARGV_ARRAY_INIT;
+static struct json_writer my__errors = JSON_WRITER_INIT;
+
+/*
+ * Write a single event to the structured log file.
+ */
+static void emit_event(struct json_writer *jw, const char *event_name)
+{
+	if (my__log_fd == -1) {
+		my__log_fd = open(my__log_path,
+				  O_WRONLY | O_APPEND | O_CREAT,
+				  0644);
+		if (my__log_fd == -1) {
+			warning("slog: could not open '%s' for logging: %s",
+				my__log_path, strerror(errno));
+			my__is_enabled = 0;
+			return;
+		}
+	}
+
+	/*
+	 * A properly authored JSON string does not have a final NL
+	 * (even when pretty-printing is enabled).  Structured logging
+	 * output should look like a series of terminated forms one
+	 * per line.  Temporarily append a NL to the buffer so that
+	 * the disk write happens atomically.
+	 */
+	strbuf_addch(&jw->json, '\n');
+	if (write(my__log_fd, jw->json.buf, jw->json.len) != jw->json.len)
+		warning("slog: could not write event '%s': %s",
+			event_name, strerror(errno));
+
+	strbuf_setlen(&jw->json, jw->json.len - 1);
+}
+
+static void emit_start_event(void)
+{
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	/* build "cmd_start" event message */
+	jw_object_begin(&jw, my__is_pretty);
+	{
+		jw_object_string(&jw, "event", "cmd_start");
+		jw_object_intmax(&jw, "clock_us", (intmax_t)my__start_time);
+		jw_object_intmax(&jw, "pid", (intmax_t)my__pid);
+
+		if (my__command_name && *my__command_name)
+			jw_object_string(&jw, "command", my__command_name);
+		if (my__sub_command_name && *my__sub_command_name)
+			jw_object_string(&jw, "sub_command", my__sub_command_name);
+
+		jw_object_inline_begin_array(&jw, "argv");
+		{
+			int k;
+			for (k = 0; k < my__argv.argc; k++)
+				jw_array_string(&jw, my__argv.argv[k]);
+		}
+		jw_end(&jw);
+	}
+	jw_end(&jw);
+
+	emit_event(&jw, "cmd_start");
+	jw_release(&jw);
+
+	my__wrote_start_event = 1;
+}
+
+static void emit_exit_event(void)
+{
+	struct json_writer jw = JSON_WRITER_INIT;
+	uint64_t atexit_time = getnanotime() / 1000;
+
+	/* close unterminated forms */
+	if (my__errors.json.len)
+		jw_end(&my__errors);
+
+	/* build "cmd_exit" event message */
+	jw_object_begin(&jw, my__is_pretty);
+	{
+		jw_object_string(&jw, "event", "cmd_exit");
+		jw_object_intmax(&jw, "clock_us", (intmax_t)atexit_time);
+		jw_object_intmax(&jw, "pid", (intmax_t)my__pid);
+
+		if (my__command_name && *my__command_name)
+			jw_object_string(&jw, "command", my__command_name);
+		if (my__sub_command_name && *my__sub_command_name)
+			jw_object_string(&jw, "sub_command", my__sub_command_name);
+
+		jw_object_inline_begin_array(&jw, "argv");
+		{
+			int k;
+			for (k = 0; k < my__argv.argc; k++)
+				jw_array_string(&jw, my__argv.argv[k]);
+		}
+		jw_end(&jw);
+
+		jw_object_inline_begin_object(&jw, "result");
+		{
+			jw_object_intmax(&jw, "exit_code", my__exit_code);
+			if (my__errors.json.len)
+				jw_object_sub_jw(&jw, "errors", &my__errors);
+
+			if (my__signal)
+				jw_object_intmax(&jw, "signal", my__signal);
+
+			if (my__exit_time > 0)
+				jw_object_intmax(&jw, "elapsed_core_us",
+						 my__exit_time - my__start_time);
+
+			jw_object_intmax(&jw, "elapsed_total_us",
+					 atexit_time - my__start_time);
+		}
+		jw_end(&jw);
+
+		jw_object_inline_begin_object(&jw, "version");
+		{
+			jw_object_string(&jw, "git", git_version_string);
+			jw_object_intmax(&jw, "slog", SLOG_VERSION);
+		}
+		jw_end(&jw);
+	}
+	jw_end(&jw);
+
+	emit_event(&jw, "cmd_exit");
+	jw_release(&jw);
+}
+
+static int cfg_path(const char *key, const char *value)
+{
+	if (is_absolute_path(value)) {
+		my__log_path = xstrdup(value);
+		my__is_enabled = 1;
+	} else {
+		warning("'%s' must be an absolute path: '%s'",
+			key, value);
+	}
+
+	return 0;
+}
+
+static int cfg_pretty(const char *key, const char *value)
+{
+	my__is_pretty = git_config_bool(key, value);
+	return 0;
+}
+
+int slog_default_config(const char *key, const char *value)
+{
+	const char *sub;
+
+	/*
+	 * git_default_config() calls slog_default_config() with "slog.*"
+	 * k/v pairs.  git_default_config() MAY or MAY NOT be called when
+	 * cmd_<command>() calls git_config().
+	 *
+	 * Remember if we've ever been called.
+	 */
+	my__is_config_loaded = 1;
+
+	if (skip_prefix(key, "slog.", &sub)) {
+		if (!strcmp(sub, "path"))
+			return cfg_path(key, value);
+		if (!strcmp(sub, "pretty"))
+			return cfg_pretty(key, value);
+	}
+
+	return 0;
+}
+
+static int lazy_load_config_cb(const char *key, const char * value, void *data)
+{
+	return slog_default_config(key, value);
+}
+
+/*
+ * If cmd_<command>() did not cause slog_default_config() to be called
+ * during git_config(), we try to lookup our config settings the first
+ * time we actually need them.
+ *
+ * (We do this rather than using read_early_config() at initialization
+ * because we want any "-c key=value" arguments to be included.)
+ */
+static inline void lazy_load_config(void)
+{
+	if (my__is_config_loaded)
+		return;
+	my__is_config_loaded = 1;
+
+	read_early_config(lazy_load_config_cb, NULL);
+}
+
+int slog_is_enabled(void)
+{
+	lazy_load_config();
+
+	return my__is_enabled;
+}
+
+static void do_final_steps(int in_signal)
+{
+	static int completed = 0;
+
+	if (completed)
+		return;
+	completed = 1;
+
+	if (slog_is_enabled()) {
+		if (!my__wrote_start_event)
+			emit_start_event();
+		emit_exit_event();
+		my__is_enabled = 0;
+	}
+
+	if (my__log_fd != -1)
+		close(my__log_fd);
+	free(my__log_path);
+	free(my__command_name);
+	free(my__sub_command_name);
+	argv_array_clear(&my__argv);
+	jw_release(&my__errors);
+}
+
+static void slog_atexit(void)
+{
+	do_final_steps(0);
+}
+
+static void slog_signal(int signo)
+{
+	my__signal = signo;
+
+	do_final_steps(1);
+
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static void intern_argv(int argc, const char **argv)
+{
+	int k;
+
+	for (k = 0; k < argc; k++)
+		argv_array_push(&my__argv, argv[k]);
+}
+
+/*
+ * Collect basic startup information before cmd_main() has a chance
+ * to alter the command line and before we have seen the config (to
+ * know if logging is enabled).  And since the config isn't loaded
+ * until cmd_main() dispatches to cmd_<command>(), we have to wait
+ * and lazy-write the "cmd_start" event.
+ *
+ * This also implies that commands such as "help" and "version" that
+ * don't need load the config won't generate any log data.
+ */
+static void initialize(int argc, const char **argv)
+{
+	my__start_time = getnanotime() / 1000;
+	my__pid = getpid();
+
+	intern_argv(argc, argv);
+
+	atexit(slog_atexit);
+
+	/*
+	 * Put up backstop signal handler to ensure we get the "cmd_exit"
+	 * event.  This is primarily for when the pager throws SIGPIPE
+	 * when the user quits.
+	 */
+	sigchain_push(SIGPIPE, slog_signal);
+}
+
+int slog_wrap_main(slog_fn_main_t fn_main, int argc, const char **argv)
+{
+	int result;
+
+	initialize(argc, argv);
+	result = fn_main(argc, argv);
+	slog_exit_code(result);
+
+	return result;
+}
+
+void slog_set_command_name(const char *command_name)
+{
+	/*
+	 * Capture the command name even if logging is not enabled
+	 * because we don't know if the config has been loaded yet by
+	 * the cmd_<command>() and/or it may be too early to force a
+	 * lazy load.
+	 */
+	if (my__command_name)
+		free(my__command_name);
+	my__command_name = xstrdup(command_name);
+}
+
+void slog_set_sub_command_name(const char *sub_command_name)
+{
+	/*
+	 * Capture the sub-command name even if logging is not enabled
+	 * because we don't know if the config has been loaded yet by
+	 * the cmd_<command>() and/or it may be too early to force a
+	 * lazy load.
+	 */
+	if (my__sub_command_name)
+		free(my__sub_command_name);
+	my__sub_command_name = xstrdup(sub_command_name);
+}
+
+int slog_is_pretty(void)
+{
+	return my__is_pretty;
+}
+
+int slog_exit_code(int exit_code)
+{
+	my__exit_time = getnanotime() / 1000;
+	my__exit_code = exit_code;
+
+	return exit_code;
+}
+
+void slog_error_message(const char *prefix, const char *fmt, va_list params)
+{
+	struct strbuf em = STRBUF_INIT;
+	va_list copy_params;
+
+	if (prefix && *prefix)
+		strbuf_addstr(&em, prefix);
+
+	va_copy(copy_params, params);
+	strbuf_vaddf(&em, fmt, copy_params);
+	va_end(copy_params);
+
+	if (!my__errors.json.len)
+		jw_array_begin(&my__errors, my__is_pretty);
+	jw_array_string(&my__errors, em.buf);
+	/* leave my__errors array unterminated for now */
+
+	strbuf_release(&em);
+}
+
 #endif
diff --git a/structured-logging.h b/structured-logging.h
index c9e8c1d..61e98e6 100644
--- a/structured-logging.h
+++ b/structured-logging.h
@@ -1,13 +1,95 @@
 #ifndef STRUCTURED_LOGGING_H
 #define STRUCTURED_LOGGING_H
 
+typedef int (*slog_fn_main_t)(int, const char **);
+
 #if !defined(STRUCTURED_LOGGING)
 /*
  * Structured logging is not available.
  * Stub out all API routines.
  */
+#define slog_is_available() (0)
+#define slog_default_config(k, v) (0)
+#define slog_wrap_main(real_cmd_main, argc, argv) ((real_cmd_main)((argc), (argv)))
+#define slog_set_command_name(n) do { } while (0)
+#define slog_set_sub_command_name(n) do { } while (0)
+#define slog_is_enabled() (0)
+#define slog_is_pretty() (0)
+#define slog_exit_code(exit_code) (exit_code)
+#define slog_error_message(prefix, fmt, params) do { } while (0)
 
 #else
 
+/*
+ * Is structured logging available (compiled-in)?
+ */
+#define slog_is_available() (1)
+
+/*
+ * Process "slog.*" config settings.
+ */
+int slog_default_config(const char *key, const char *value);
+
+/*
+ * Wrapper for the "real" cmd_main().  Initialize structured logging if
+ * enabled, run the given real_cmd_main(), and capture the return value.
+ *
+ * Note:  common-main.c is shared by many top-level commands.
+ * common-main.c:main() does common process setup before calling
+ * the version of cmd_main() found in the executable.  Some commands
+ * SHOULD NOT do logging (such as t/helper/test-tool).  Ones that do
+ * need some common initialization/teardown.
+ *
+ * Use this function for any top-level command that should do logging.
+ *
+ * Usage:
+ *
+ * static int real_cmd_main(int argc, const char **argv)
+ * {
+ *     ....the actual code for the command....
+ * }
+ *
+ * int cmd_main(int argc, const char **argv)
+ * {
+ *     return slog_wrap_main(real_cmd_main, argc, argv);
+ * }
+ * 
+ *
+ * See git.c for an example.
+ */ 
+int slog_wrap_main(slog_fn_main_t real_cmd_main, int argc, const char **argv);
+
+/*
+ * Record a canonical command name and optional sub-command name for the
+ * current process.  For example, "checkout" and "switch-branch".
+ */
+void slog_set_command_name(const char *name);
+void slog_set_sub_command_name(const char *name);
+
+/*
+ * Is structured logging enabled?
+ */
+int slog_is_enabled(void);
+
+/*
+ * Is JSON pretty-printing enabled?
+ */
+int slog_is_pretty(void);
+
+/*
+ * Register the process exit code with the structured logging layer
+ * and return it.  This value will appear in the final "cmd_exit" event.
+ *
+ * Use this to wrap all calls to exit().
+ * Use this before returning in main().
+ */
+int slog_exit_code(int exit_code);
+
+/*
+ * Append formatted error message to the structured log result.
+ * Messages from this will appear in the final "cmd_exit" event.
+ */
+void slog_error_message(const char *prefix, const char *fmt, va_list params);
+
 #endif /* STRUCTURED_LOGGING */
 #endif /* STRUCTURED_LOGGING_H */
diff --git a/usage.c b/usage.c
index cc80333..5d48f6b 100644
--- a/usage.c
+++ b/usage.c
@@ -27,12 +27,16 @@ static NORETURN void usage_builtin(const char *err, va_list params)
 
 static NORETURN void die_builtin(const char *err, va_list params)
 {
+	slog_error_message("fatal: ", err, params);
+
 	vreportf("fatal: ", err, params);
 	exit(128);
 }
 
 static void error_builtin(const char *err, va_list params)
 {
+	slog_error_message("error: ", err, params);
+
 	vreportf("error: ", err, params);
 }
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 04/25] structured-logging: add session-id to log events
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (2 preceding siblings ...)
  2018-07-13 16:55 ` [PATCH v1 03/25] structured-logging: add structured logging framework git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 05/25] structured-logging: set sub_command field for branch command git
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to create a unique session id (SID) during structured
logging initialization and use that SID in all log events.

This SID is exported into a transient environment variable and
inherited by child processes.  This allows git child processes
to be related back to the parent git process event if there are
intermediate /bin/sh processes between them.

Update t0001 to ignore the environment variable GIT_SLOG_PARENT_SID.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/git.txt |  6 ++++++
 structured-logging.c  | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++
 t/t0001-init.sh       |  1 +
 3 files changed, 59 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index dba7f0c..a24f399 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -766,6 +766,12 @@ standard output.
 	adequate and support for it is likely to be removed in the
 	foreseeable future (along with the variable).
 
+`GIT_SLOG_PARENT_SID`::
+	(Experimental) A transient environment variable set by top-level
+	Git commands and inherited by child Git commands.  It contains
+	a session id that will be written the structured logging output
+	to help associate child and parent processes.
+
 Discussion[[Discussion]]
 ------------------------
 
diff --git a/structured-logging.c b/structured-logging.c
index afa2224..289140f 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -31,9 +31,57 @@ static char *my__command_name;
 static char *my__sub_command_name;
 
 static struct argv_array my__argv = ARGV_ARRAY_INIT;
+static struct strbuf my__session_id = STRBUF_INIT;
 static struct json_writer my__errors = JSON_WRITER_INIT;
 
 /*
+ * Compute a new session id for the current process.  Build string
+ * with the start time and PID of the current process and append
+ * the inherited session id from our parent process (if present).
+ * The parent session id may include its parent session id.
+ *
+ * sid := <start-time> '-' <pid> [ ':' <parent-sid> [ ... ] ]
+ */
+static void compute_our_sid(void)
+{
+	const char *parent_sid;
+
+	if (my__session_id.len)
+		return;
+
+	/*
+	 * A "session id" (SID) is a cheap, unique-enough string to
+	 * associate child process with the hierarchy of invoking git
+	 * processes.
+	 *
+	 * This is stronger than a simple parent-pid because we may
+	 * have an intermediate shell between a top-level Git command
+	 * and a child Git command.  It also isolates from issues
+	 * about how the OS recycles PIDs.
+	 *
+	 * This could be a UUID/GUID, but that is overkill for our
+	 * needs here and more expensive to compute.
+	 *
+	 * Consumers should consider this an unordered opaque string
+	 * in case we decide to switch to a real UUID in the future.
+	 */
+	strbuf_addf(&my__session_id, "%"PRIuMAX"-%"PRIdMAX,
+		    (uintmax_t)my__start_time, (intmax_t)my__pid);
+
+	parent_sid = getenv("GIT_SLOG_PARENT_SID");
+	if (parent_sid && *parent_sid) {
+		strbuf_addch(&my__session_id, ':');
+		strbuf_addstr(&my__session_id, parent_sid);
+	}
+
+	/*
+	 * Install our SID into the environment for our child processes
+	 * to inherit.
+	 */
+	setenv("GIT_SLOG_PARENT_SID", my__session_id.buf, 1);
+}
+
+/*
  * Write a single event to the structured log file.
  */
 static void emit_event(struct json_writer *jw, const char *event_name)
@@ -75,6 +123,7 @@ static void emit_start_event(void)
 		jw_object_string(&jw, "event", "cmd_start");
 		jw_object_intmax(&jw, "clock_us", (intmax_t)my__start_time);
 		jw_object_intmax(&jw, "pid", (intmax_t)my__pid);
+		jw_object_string(&jw, "sid", my__session_id.buf);
 
 		if (my__command_name && *my__command_name)
 			jw_object_string(&jw, "command", my__command_name);
@@ -112,6 +161,7 @@ static void emit_exit_event(void)
 		jw_object_string(&jw, "event", "cmd_exit");
 		jw_object_intmax(&jw, "clock_us", (intmax_t)atexit_time);
 		jw_object_intmax(&jw, "pid", (intmax_t)my__pid);
+		jw_object_string(&jw, "sid", my__session_id.buf);
 
 		if (my__command_name && *my__command_name)
 			jw_object_string(&jw, "command", my__command_name);
@@ -250,6 +300,7 @@ static void do_final_steps(int in_signal)
 	free(my__sub_command_name);
 	argv_array_clear(&my__argv);
 	jw_release(&my__errors);
+	strbuf_release(&my__session_id);
 }
 
 static void slog_atexit(void)
@@ -289,6 +340,7 @@ static void initialize(int argc, const char **argv)
 {
 	my__start_time = getnanotime() / 1000;
 	my__pid = getpid();
+	compute_our_sid();
 
 	intern_argv(argc, argv);
 
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index c413bff..3dfc37a 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -92,6 +92,7 @@ test_expect_success 'No extra GIT_* on alias scripts' '
 	env |
 		sed -n \
 			-e "/^GIT_PREFIX=/d" \
+			-e "/^GIT_SLOG_PARENT_SID=/d" \
 			-e "/^GIT_TEXTDOMAINDIR=/d" \
 			-e "/^GIT_/s/=.*//p" |
 		sort
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 05/25] structured-logging: set sub_command field for branch command
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (3 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 04/25] structured-logging: add session-id to log events git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 06/25] structured-logging: set sub_command field for checkout command git
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Set sub-command field for the various forms of the branch command.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/branch.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/builtin/branch.c b/builtin/branch.c
index 5217ba3..fba516f 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -689,10 +689,12 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		setup_auto_pager("branch", 1);
 
 	if (delete) {
+		slog_set_sub_command_name("delete");
 		if (!argc)
 			die(_("branch name required"));
 		return delete_branches(argc, argv, delete > 1, filter.kind, quiet);
 	} else if (list) {
+		slog_set_sub_command_name("list");
 		/*  git branch --local also shows HEAD when it is detached */
 		if ((filter.kind & FILTER_REFS_BRANCHES) && filter.detached)
 			filter.kind |= FILTER_REFS_DETACHED_HEAD;
@@ -716,6 +718,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		const char *branch_name;
 		struct strbuf branch_ref = STRBUF_INIT;
 
+		slog_set_sub_command_name("edit");
 		if (!argc) {
 			if (filter.detached)
 				die(_("Cannot give description to detached HEAD"));
@@ -741,6 +744,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		if (edit_branch_description(branch_name))
 			return 1;
 	} else if (copy) {
+		slog_set_sub_command_name("copy");
 		if (!argc)
 			die(_("branch name required"));
 		else if (argc == 1)
@@ -750,6 +754,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		else
 			die(_("too many branches for a copy operation"));
 	} else if (rename) {
+		slog_set_sub_command_name("rename");
 		if (!argc)
 			die(_("branch name required"));
 		else if (argc == 1)
@@ -761,6 +766,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	} else if (new_upstream) {
 		struct branch *branch = branch_get(argv[0]);
 
+		slog_set_sub_command_name("new_upstream");
 		if (argc > 1)
 			die(_("too many arguments to set new upstream"));
 
@@ -784,6 +790,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		struct branch *branch = branch_get(argv[0]);
 		struct strbuf buf = STRBUF_INIT;
 
+		slog_set_sub_command_name("unset_upstream");
 		if (argc > 1)
 			die(_("too many arguments to unset upstream"));
 
@@ -806,6 +813,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	} else if (argc > 0 && argc <= 2) {
 		struct branch *branch = branch_get(argv[0]);
 
+		slog_set_sub_command_name("create");
 		if (!branch)
 			die(_("no such branch '%s'"), argv[0]);
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 06/25] structured-logging: set sub_command field for checkout command
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (4 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 05/25] structured-logging: set sub_command field for branch command git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 07/25] structured-logging: t0420 basic tests git
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/checkout.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2e1d237..d05890b 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -249,6 +249,8 @@ static int checkout_paths(const struct checkout_opts *opts,
 	int errs = 0;
 	struct lock_file lock_file = LOCK_INIT;
 
+	slog_set_sub_command_name(opts->patch_mode ? "patch" : "path");
+
 	if (opts->track != BRANCH_TRACK_UNSPECIFIED)
 		die(_("'%s' cannot be used with updating paths"), "--track");
 
@@ -826,6 +828,9 @@ static int switch_branches(const struct checkout_opts *opts,
 	void *path_to_free;
 	struct object_id rev;
 	int flag, writeout_error = 0;
+
+	slog_set_sub_command_name("switch_branch");
+
 	memset(&old_branch_info, 0, sizeof(old_branch_info));
 	old_branch_info.path = path_to_free = resolve_refdup("HEAD", 0, &rev, &flag);
 	if (old_branch_info.path)
@@ -1037,6 +1042,8 @@ static int switch_unborn_to_new_branch(const struct checkout_opts *opts)
 	int status;
 	struct strbuf branch_ref = STRBUF_INIT;
 
+	slog_set_sub_command_name("switch_unborn_to_new_branch");
+
 	if (!opts->new_branch)
 		die(_("You are on a branch yet to be born"));
 	strbuf_addf(&branch_ref, "refs/heads/%s", opts->new_branch);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 07/25] structured-logging: t0420 basic tests
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (5 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 06/25] structured-logging: set sub_command field for checkout command git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 08/25] structured-logging: add detail-event facility git
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add structured logging prereq definition "SLOG" to test-lib.sh.
Create t0420 test script with some basic tests.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t0420-structured-logging.sh | 143 ++++++++++++++++++++++++++++++++++++++++++
 t/t0420/parse_json.perl       |  52 +++++++++++++++
 t/test-lib.sh                 |   1 +
 3 files changed, 196 insertions(+)
 create mode 100755 t/t0420-structured-logging.sh
 create mode 100644 t/t0420/parse_json.perl

diff --git a/t/t0420-structured-logging.sh b/t/t0420-structured-logging.sh
new file mode 100755
index 0000000..a594af3
--- /dev/null
+++ b/t/t0420-structured-logging.sh
@@ -0,0 +1,143 @@
+#!/bin/sh
+
+test_description='structured logging tests'
+
+. ./test-lib.sh
+
+if ! test_have_prereq SLOG
+then
+	skip_all='skipping structured logging tests'
+	test_done
+fi
+
+LOGFILE=$TRASH_DIRECTORY/test.log
+
+test_expect_success 'setup' '
+	test_commit hello &&
+	cat >key_cmd_start <<-\EOF &&
+	"event":"cmd_start"
+	EOF
+	cat >key_cmd_exit <<-\EOF &&
+	"event":"cmd_exit"
+	EOF
+	cat >key_exit_code_0 <<-\EOF &&
+	"exit_code":0
+	EOF
+	cat >key_exit_code_129 <<-\EOF &&
+	"exit_code":129
+	EOF
+	git config --local slog.pretty false &&
+	git config --local slog.path "$LOGFILE"
+'
+
+test_expect_success 'basic events' '
+	test_when_finished "rm \"$LOGFILE\"" &&
+	git status >/dev/null &&
+	grep -f key_cmd_start "$LOGFILE" &&
+	grep -f key_cmd_exit "$LOGFILE" &&
+	grep -f key_exit_code_0 "$LOGFILE"
+'
+
+test_expect_success 'basic error code and message' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	test_expect_code 129 git status --xyzzy >/dev/null 2>/dev/null &&
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+	grep -f key_exit_code_129 event_exit &&
+	grep "\"errors\":" event_exit
+'
+
+test_lazy_prereq PERLJSON '
+	perl -MJSON -e "exit 0"
+'
+
+# Let perl parse the resulting JSON and dump it out.
+#
+# Since the output contains PIDs, SIDs, clock values, and the full path to
+# git[.exe] we cannot have a HEREDOC with the expected result, so we look
+# for a few key fields.
+#
+test_expect_success PERLJSON 'parse JSON for basic command' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	git status >/dev/null &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+
+	grep "row\[0\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[0\]\.argv\[1\] status" <parsed_exit &&
+	grep "row\[0\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[0\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[0\]\.command status" <parsed_exit
+'
+
+test_expect_success PERLJSON 'parse JSON for branch command/sub-command' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	git branch -v >/dev/null &&
+	git branch --all >/dev/null &&
+	git branch new_branch >/dev/null &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+
+	grep "row\[0\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[0\]\.argv\[1\] branch" <parsed_exit &&
+	grep "row\[0\]\.argv\[2\] -v" <parsed_exit &&
+	grep "row\[0\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[0\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[0\]\.command branch" <parsed_exit &&
+	grep "row\[0\]\.sub_command list" <parsed_exit &&
+
+	grep "row\[1\]\.argv\[1\] branch" <parsed_exit &&
+	grep "row\[1\]\.argv\[2\] --all" <parsed_exit &&
+	grep "row\[1\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[1\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[1\]\.command branch" <parsed_exit &&
+	grep "row\[1\]\.sub_command list" <parsed_exit &&
+
+	grep "row\[2\]\.argv\[1\] branch" <parsed_exit &&
+	grep "row\[2\]\.argv\[2\] new_branch" <parsed_exit &&
+	grep "row\[2\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[2\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[2\]\.command branch" <parsed_exit &&
+	grep "row\[2\]\.sub_command create" <parsed_exit
+'
+
+test_expect_success PERLJSON 'parse JSON for checkout command' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	git checkout new_branch >/dev/null &&
+	git checkout master >/dev/null &&
+	git checkout -- hello.t >/dev/null &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+
+	grep "row\[0\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[0\]\.argv\[1\] checkout" <parsed_exit &&
+	grep "row\[0\]\.argv\[2\] new_branch" <parsed_exit &&
+	grep "row\[0\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[0\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[0\]\.command checkout" <parsed_exit &&
+	grep "row\[0\]\.sub_command switch_branch" <parsed_exit &&
+
+	grep "row\[1\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[1\]\.argv\[1\] checkout" <parsed_exit &&
+	grep "row\[1\]\.argv\[2\] master" <parsed_exit &&
+	grep "row\[1\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[1\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[1\]\.command checkout" <parsed_exit &&
+	grep "row\[1\]\.sub_command switch_branch" <parsed_exit &&
+
+	grep "row\[2\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[2\]\.argv\[1\] checkout" <parsed_exit &&
+	grep "row\[2\]\.argv\[2\] --" <parsed_exit &&
+	grep "row\[2\]\.argv\[3\] hello.t" <parsed_exit &&
+	grep "row\[2\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[2\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[2\]\.command checkout" <parsed_exit &&
+	grep "row\[2\]\.sub_command path" <parsed_exit
+'
+
+test_done
diff --git a/t/t0420/parse_json.perl b/t/t0420/parse_json.perl
new file mode 100644
index 0000000..ca4e5bf
--- /dev/null
+++ b/t/t0420/parse_json.perl
@@ -0,0 +1,52 @@
+#!/usr/bin/perl
+use strict;
+use warnings;
+use JSON;
+
+sub dump_array {
+    my ($label_in, $ary_ref) = @_;
+    my @ary = @$ary_ref;
+
+    for ( my $i = 0; $i <= $#{ $ary_ref }; $i++ )
+    {
+	my $label = "$label_in\[$i\]";
+	dump_item($label, $ary[$i]);
+    }
+}
+
+sub dump_hash {
+    my ($label_in, $obj_ref) = @_;
+    my %obj = %$obj_ref;
+
+    foreach my $k (sort keys %obj) {
+	my $label = (length($label_in) > 0) ? "$label_in.$k" : "$k";
+	my $value = $obj{$k};
+
+	dump_item($label, $value);
+    }
+}
+
+sub dump_item {
+    my ($label_in, $value) = @_;
+    if (ref($value) eq 'ARRAY') {
+	print "$label_in array\n";
+	dump_array($label_in, $value);
+    } elsif (ref($value) eq 'HASH') {
+	print "$label_in hash\n";
+	dump_hash($label_in, $value);
+    } elsif (defined $value) {
+	print "$label_in $value\n";
+    } else {
+	print "$label_in null\n";
+    }
+}
+
+my $row = 0;
+while (<>) {
+    my $data = decode_json( $_ );
+    my $label = "row[$row]";
+
+    dump_hash($label, $data);
+    $row++;
+}
+
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 2831570..3d38bc7 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1071,6 +1071,7 @@ test -n "$USE_LIBPCRE1$USE_LIBPCRE2" && test_set_prereq PCRE
 test -n "$USE_LIBPCRE1" && test_set_prereq LIBPCRE1
 test -n "$USE_LIBPCRE2" && test_set_prereq LIBPCRE2
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
+test -z "$STRUCTURED_LOGGING" || test_set_prereq SLOG
 
 # Can we rely on git's output in the C locale?
 if test -n "$GETTEXT_POISON"
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 08/25] structured-logging: add detail-event facility
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (6 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 07/25] structured-logging: t0420 basic tests git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 09/25] structured-logging: add detail-event for lazy_init_name_hash git
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add a generic "detail-event" to structured logging.  This can be used
to emit context-specific events for performance or debugging purposes.
These are conceptually similar to the various GIT_TRACE_<key> messages.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/config.txt | 13 +++++++
 structured-logging.c     | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
 structured-logging.h     | 16 ++++++++
 3 files changed, 124 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c79f2bf..88f93fe 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3176,6 +3176,19 @@ slog.pretty::
 	(EXPERIMENTAL) Pretty-print structured log data when true.
 	(Git must be compiled with STRUCTURED_LOGGING=1.)
 
+slog.detail::
+	(EXPERIMENTAL) May be set to a boolean value or a list of comma
+	separated tokens.  Controls which categories of optional "detail"
+	events are generated.  Default to off.  This is conceptually
+	similar to the different GIT_TRACE_<key> values.
++
+Detail events are generic events with a context-specific payload.  This
+may represent a single function call or a section of performance sensitive
+code.
++
+This is intended to be an extendable facility where new events can easily
+be added (possibly only for debugging or performance testing purposes).
+
 splitIndex.maxPercentChange::
 	When the split index feature is used, this specifies the
 	percent of entries the split index can contain compared to the
diff --git a/structured-logging.c b/structured-logging.c
index 289140f..9cbf3bd 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -34,6 +34,34 @@ static struct argv_array my__argv = ARGV_ARRAY_INIT;
 static struct strbuf my__session_id = STRBUF_INIT;
 static struct json_writer my__errors = JSON_WRITER_INIT;
 
+struct category_filter
+{
+	char *categories;
+	int want;
+};
+
+static struct category_filter my__detail_categories;
+
+static void set_want_categories(struct category_filter *cf, const char *value)
+{
+	FREE_AND_NULL(cf->categories);
+
+	cf->want = git_parse_maybe_bool(value);
+	if (cf->want == -1)
+		cf->categories = xstrdup(value);
+}
+
+static int want_category(const struct category_filter *cf, const char *category)
+{
+	if (cf->want == 0 || cf->want == 1)
+		return cf->want;
+
+	if (!category || !*category)
+		return 0;
+
+	return !!strstr(cf->categories, category);
+}
+
 /*
  * Compute a new session id for the current process.  Build string
  * with the start time and PID of the current process and append
@@ -207,6 +235,40 @@ static void emit_exit_event(void)
 	jw_release(&jw);
 }
 
+static void emit_detail_event(const char *category, const char *label,
+			      const struct json_writer *data)
+{
+	struct json_writer jw = JSON_WRITER_INIT;
+	uint64_t clock_us = getnanotime() / 1000;
+
+	/* build "detail" event */
+	jw_object_begin(&jw, my__is_pretty);
+	{
+		jw_object_string(&jw, "event", "detail");
+		jw_object_intmax(&jw, "clock_us", (intmax_t)clock_us);
+		jw_object_intmax(&jw, "pid", (intmax_t)my__pid);
+		jw_object_string(&jw, "sid", my__session_id.buf);
+
+		if (my__command_name && *my__command_name)
+			jw_object_string(&jw, "command", my__command_name);
+		if (my__sub_command_name && *my__sub_command_name)
+			jw_object_string(&jw, "sub_command", my__sub_command_name);
+
+		jw_object_inline_begin_object(&jw, "detail");
+		{
+			jw_object_string(&jw, "category", category);
+			jw_object_string(&jw, "label", label);
+			if (data)
+				jw_object_sub_jw(&jw, "data", data);
+		}
+		jw_end(&jw);
+	}
+	jw_end(&jw);
+
+	emit_event(&jw, "detail");
+	jw_release(&jw);
+}
+
 static int cfg_path(const char *key, const char *value)
 {
 	if (is_absolute_path(value)) {
@@ -226,6 +288,12 @@ static int cfg_pretty(const char *key, const char *value)
 	return 0;
 }
 
+static int cfg_detail(const char *key, const char *value)
+{
+	set_want_categories(&my__detail_categories, value);
+	return 0;
+}
+
 int slog_default_config(const char *key, const char *value)
 {
 	const char *sub;
@@ -244,6 +312,8 @@ int slog_default_config(const char *key, const char *value)
 			return cfg_path(key, value);
 		if (!strcmp(sub, "pretty"))
 			return cfg_pretty(key, value);
+		if (!strcmp(sub, "detail"))
+			return cfg_detail(key, value);
 	}
 
 	return 0;
@@ -424,4 +494,29 @@ void slog_error_message(const char *prefix, const char *fmt, va_list params)
 	strbuf_release(&em);
 }
 
+int slog_want_detail_event(const char *category)
+{
+	return want_category(&my__detail_categories, category);
+}
+
+void slog_emit_detail_event(const char *category, const char *label,
+			    const struct json_writer *data)
+{
+	if (!my__wrote_start_event)
+		emit_start_event();
+
+	if (!slog_want_detail_event(category))
+		return;
+
+	if (!category || !*category)
+		BUG("no category for slog.detail event");
+	if (!label || !*label)
+		BUG("no label for slog.detail event");
+	if (data && !jw_is_terminated(data))
+		BUG("unterminated slog.detail data: '%s' '%s' '%s'",
+		    category, label, data->json.buf);
+
+	emit_detail_event(category, label, data);
+}
+
 #endif
diff --git a/structured-logging.h b/structured-logging.h
index 61e98e6..01ae55d 100644
--- a/structured-logging.h
+++ b/structured-logging.h
@@ -1,6 +1,8 @@
 #ifndef STRUCTURED_LOGGING_H
 #define STRUCTURED_LOGGING_H
 
+struct json_writer;
+
 typedef int (*slog_fn_main_t)(int, const char **);
 
 #if !defined(STRUCTURED_LOGGING)
@@ -17,6 +19,8 @@ typedef int (*slog_fn_main_t)(int, const char **);
 #define slog_is_pretty() (0)
 #define slog_exit_code(exit_code) (exit_code)
 #define slog_error_message(prefix, fmt, params) do { } while (0)
+#define slog_want_detail_event(category) (0)
+#define slog_emit_detail_event(category, label, data) do { } while (0)
 
 #else
 
@@ -91,5 +95,17 @@ int slog_exit_code(int exit_code);
  */
 void slog_error_message(const char *prefix, const char *fmt, va_list params);
 
+/*
+ * Is detail logging enabled for this category?
+ */
+int slog_want_detail_event(const char *category);
+
+/*
+ * Write a detail event.
+ */
+
+void slog_emit_detail_event(const char *category, const char *label,
+			    const struct json_writer *data);
+
 #endif /* STRUCTURED_LOGGING */
 #endif /* STRUCTURED_LOGGING_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 09/25] structured-logging: add detail-event for lazy_init_name_hash
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (7 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 08/25] structured-logging: add detail-event facility git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 10/25] structured-logging: add timer facility git
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to generate a structured logging detail-event for
lazy_init_name_hash().  This is marked as an "index" category
event and includes time and size data for the hashmaps.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 name-hash.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index 1638498..939b26a 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -7,6 +7,8 @@
  */
 #define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "json-writer.h"
+#include "structured-logging.h"
 
 struct dir_entry {
 	struct hashmap_entry ent;
@@ -603,6 +605,30 @@ static void lazy_init_name_hash(struct index_state *istate)
 
 	istate->name_hash_initialized = 1;
 	trace_performance_since(start, "initialize name hash");
+
+	if (slog_want_detail_event("index")) {
+		struct json_writer jw = JSON_WRITER_INIT;
+		uint64_t now_ns = getnanotime();
+		uint64_t elapsed_us = (now_ns - start) / 1000;
+
+		jw_object_begin(&jw, slog_is_pretty());
+		{
+			jw_object_intmax(&jw, "cache_nr", istate->cache_nr);
+			jw_object_intmax(&jw, "elapsed_us", elapsed_us);
+			jw_object_intmax(&jw, "dir_count",
+					 hashmap_get_size(&istate->dir_hash));
+			jw_object_intmax(&jw, "dir_tablesize",
+					 istate->dir_hash.tablesize);
+			jw_object_intmax(&jw, "name_count",
+					 hashmap_get_size(&istate->name_hash));
+			jw_object_intmax(&jw, "name_tablesize",
+					 istate->name_hash.tablesize);
+		}
+		jw_end(&jw);
+
+		slog_emit_detail_event("index", "lazy_init_name_hash", &jw);
+		jw_release(&jw);
+	}
 }
 
 /*
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 10/25] structured-logging: add timer facility
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (8 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 09/25] structured-logging: add detail-event for lazy_init_name_hash git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 11/25] structured-logging: add timer around do_read_index git
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add timer facility to structured logging.  This allows stopwatch-like
operations over the life of the git process.  Timer data is summarized
in the "cmd_exit" event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/config.txt |   6 ++
 structured-logging.c     | 180 +++++++++++++++++++++++++++++++++++++++++++++++
 structured-logging.h     |  19 +++++
 3 files changed, 205 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 88f93fe..7817966 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3189,6 +3189,12 @@ code.
 This is intended to be an extendable facility where new events can easily
 be added (possibly only for debugging or performance testing purposes).
 
+slog.timers::
+	(EXPERIMENTAL) May be set to a boolean value or a list of comma
+	separated tokens.  Controls which categories of SLOG timers are
+	enabled.  Defaults to off.  Data for enabled timers is added to
+	the `cmd_exit` event.
+
 splitIndex.maxPercentChange::
 	When the split index feature is used, this specifies the
 	percent of entries the split index can contain compared to the
diff --git a/structured-logging.c b/structured-logging.c
index 9cbf3bd..215138c 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -15,6 +15,26 @@
 
 #define SLOG_VERSION 0
 
+struct timer_data {
+	char *category;
+	char *name;
+	uint64_t total_ns;
+	uint64_t min_ns;
+	uint64_t max_ns;
+	uint64_t start_ns;
+	int count;
+	int started;
+};
+
+struct timer_data_array {
+	struct timer_data **array;
+	size_t nr, alloc;
+};
+
+static struct timer_data_array my__timers;
+static void format_timers(struct json_writer *jw);
+static void free_timers(void);
+
 static uint64_t my__start_time;
 static uint64_t my__exit_time;
 static int my__is_config_loaded;
@@ -41,6 +61,7 @@ struct category_filter
 };
 
 static struct category_filter my__detail_categories;
+static struct category_filter my__timer_categories;
 
 static void set_want_categories(struct category_filter *cf, const char *value)
 {
@@ -228,6 +249,12 @@ static void emit_exit_event(void)
 			jw_object_intmax(&jw, "slog", SLOG_VERSION);
 		}
 		jw_end(&jw);
+
+		if (my__timers.nr) {
+			jw_object_inline_begin_object(&jw, "timers");
+			format_timers(&jw);
+			jw_end(&jw);
+		}
 	}
 	jw_end(&jw);
 
@@ -294,6 +321,12 @@ static int cfg_detail(const char *key, const char *value)
 	return 0;
 }
 
+static int cfg_timers(const char *key, const char *value)
+{
+	set_want_categories(&my__timer_categories, value);
+	return 0;
+}
+
 int slog_default_config(const char *key, const char *value)
 {
 	const char *sub;
@@ -314,6 +347,8 @@ int slog_default_config(const char *key, const char *value)
 			return cfg_pretty(key, value);
 		if (!strcmp(sub, "detail"))
 			return cfg_detail(key, value);
+		if (!strcmp(sub, "timers"))
+			return cfg_timers(key, value);
 	}
 
 	return 0;
@@ -371,6 +406,7 @@ static void do_final_steps(int in_signal)
 	argv_array_clear(&my__argv);
 	jw_release(&my__errors);
 	strbuf_release(&my__session_id);
+	free_timers();
 }
 
 static void slog_atexit(void)
@@ -519,4 +555,148 @@ void slog_emit_detail_event(const char *category, const char *label,
 	emit_detail_event(category, label, data);
 }
 
+int slog_start_timer(const char *category, const char *name)
+{
+	int k;
+	struct timer_data *td;
+
+	if (!want_category(&my__timer_categories, category))
+		return SLOG_UNDEFINED_TIMER_ID;
+	if (!name || !*name)
+		return SLOG_UNDEFINED_TIMER_ID;
+
+	for (k = 0; k < my__timers.nr; k++) {
+		td = my__timers.array[k];
+		if (!strcmp(category, td->category) && !strcmp(name, td->name))
+			goto start_timer;
+	}
+
+	td = xcalloc(1, sizeof(struct timer_data));
+	td->category = xstrdup(category);
+	td->name = xstrdup(name);
+	td->min_ns = UINT64_MAX;
+
+	ALLOC_GROW(my__timers.array, my__timers.nr + 1, my__timers.alloc);
+	my__timers.array[my__timers.nr++] = td;
+
+start_timer:
+	if (td->started)
+		BUG("slog.timer '%s:%s' already started",
+		    td->category, td->name);
+
+	td->start_ns = getnanotime();
+	td->started = 1;
+
+	return k;
+}
+
+static void stop_timer(struct timer_data *td)
+{
+	uint64_t delta_ns = getnanotime() - td->start_ns;
+
+	td->count++;
+	td->total_ns += delta_ns;
+	if (delta_ns < td->min_ns)
+		td->min_ns = delta_ns;
+	if (delta_ns > td->max_ns)
+		td->max_ns = delta_ns;
+	td->started = 0;
+}
+
+void slog_stop_timer(int tid)
+{
+	struct timer_data *td;
+
+	if (tid == SLOG_UNDEFINED_TIMER_ID)
+		return;
+	if (tid >= my__timers.nr || tid < 0)
+		BUG("Invalid slog.timer id '%d'", tid);
+
+	td = my__timers.array[tid];
+	if (!td->started)
+		BUG("slog.timer '%s:%s' not started", td->category, td->name);
+
+	stop_timer(td);
+}
+
+static int sort_timers_cb(const void *a, const void *b)
+{
+	struct timer_data *td_a = *(struct timer_data **)a;
+	struct timer_data *td_b = *(struct timer_data **)b;
+	int r;
+
+	r = strcmp(td_a->category, td_b->category);
+	if (r)
+		return r;
+	return strcmp(td_a->name, td_b->name);
+}
+
+static void format_a_timer(struct json_writer *jw, struct timer_data *td,
+			   int force_stop)
+{
+	jw_object_inline_begin_object(jw, td->name);
+	{
+		jw_object_intmax(jw, "count", td->count);
+		jw_object_intmax(jw, "total_us", td->total_ns / 1000);
+		if (td->count > 1) {
+			uint64_t avg_ns = td->total_ns / td->count;
+
+			jw_object_intmax(jw, "min_us", td->min_ns / 1000);
+			jw_object_intmax(jw, "max_us", td->max_ns / 1000);
+			jw_object_intmax(jw, "avg_us", avg_ns / 1000);
+		}
+		if (force_stop)
+			jw_object_true(jw, "force_stop");
+	}
+	jw_end(jw);
+}
+
+static void format_timers(struct json_writer *jw)
+{
+	const char *open_category = NULL;
+	int k;
+
+	QSORT(my__timers.array, my__timers.nr, sort_timers_cb);
+
+	for (k = 0; k < my__timers.nr; k++) {
+		struct timer_data *td = my__timers.array[k];
+		int force_stop = td->started;
+
+		if (force_stop)
+			stop_timer(td);
+
+		if (!open_category) {
+			jw_object_inline_begin_object(jw, td->category);
+			open_category = td->category;
+		}
+		else if (strcmp(open_category, td->category)) {
+			jw_end(jw);
+			jw_object_inline_begin_object(jw, td->category);
+			open_category = td->category;
+		}
+
+		format_a_timer(jw, td, force_stop);
+	}
+
+	if (open_category)
+		jw_end(jw);
+}
+
+static void free_timers(void)
+{
+	int k;
+
+	for (k = 0; k < my__timers.nr; k++) {
+		struct timer_data *td = my__timers.array[k];
+
+		free(td->category);
+		free(td->name);
+		free(td);
+	}
+
+	FREE_AND_NULL(my__timers.array);
+	my__timers.nr = 0;
+	my__timers.alloc = 0;
+}
+
 #endif
diff --git a/structured-logging.h b/structured-logging.h
index 01ae55d..a29aa6e 100644
--- a/structured-logging.h
+++ b/structured-logging.h
@@ -5,6 +5,8 @@ struct json_writer;
 
 typedef int (*slog_fn_main_t)(int, const char **);
 
+#define SLOG_UNDEFINED_TIMER_ID (-1)
+
 #if !defined(STRUCTURED_LOGGING)
 /*
  * Structured logging is not available.
@@ -21,6 +23,8 @@ typedef int (*slog_fn_main_t)(int, const char **);
 #define slog_error_message(prefix, fmt, params) do { } while (0)
 #define slog_want_detail_event(category) (0)
 #define slog_emit_detail_event(category, label, data) do { } while (0)
+#define slog_start_timer(category, name) (SLOG_UNDEFINED_TIMER_ID)
+static inline void slog_stop_timer(int tid) { };
 
 #else
 
@@ -107,5 +111,20 @@ int slog_want_detail_event(const char *category);
 void slog_emit_detail_event(const char *category, const char *label,
 			    const struct json_writer *data);
 
+/*
+ * Define and start or restart a structured logging timer.  Stats for the
+ * timer will be added to the "cmd_exit" event. Use a timer when you are
+ * interested in the net time of an operation (such as part of a computation
+ * in a loop) but don't want a detail event for each iteration.
+ *
+ * Returns a timer id.
+ */
+int slog_start_timer(const char *category, const char *name);
+
+/*
+ * Stop the timer.
+ */
+void slog_stop_timer(int tid);
+
 #endif /* STRUCTURED_LOGGING */
 #endif /* STRUCTURED_LOGGING_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 11/25] structured-logging: add timer around do_read_index
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (9 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 10/25] structured-logging: add timer facility git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 12/25] structured-logging: add timer around do_write_index git
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use a SLOG timer to record the time spent in do_read_index()
and report it in the "cmd_exit" event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 read-cache.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 3725882..df5dc87 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1900,6 +1900,7 @@ static void freshen_shared_index(const char *shared_index, int warn)
 int read_index_from(struct index_state *istate, const char *path,
 		    const char *gitdir)
 {
+	int slog_tid;
 	uint64_t start = getnanotime();
 	struct split_index *split_index;
 	int ret;
@@ -1910,7 +1911,9 @@ int read_index_from(struct index_state *istate, const char *path,
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	slog_tid = slog_start_timer("index", "do_read_index");
 	ret = do_read_index(istate, path, 0);
+	slog_stop_timer(slog_tid);
 	trace_performance_since(start, "read cache %s", path);
 
 	split_index = istate->split_index;
@@ -1926,7 +1929,9 @@ int read_index_from(struct index_state *istate, const char *path,
 
 	base_oid_hex = oid_to_hex(&split_index->base_oid);
 	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
+	slog_tid = slog_start_timer("index", "do_read_index");
 	ret = do_read_index(split_index->base, base_path, 1);
+	slog_stop_timer(slog_tid);
 	if (oidcmp(&split_index->base_oid, &split_index->base->oid))
 		die("broken index, expect %s in %s, got %s",
 		    base_oid_hex, base_path,
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 12/25] structured-logging: add timer around do_write_index
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (10 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 11/25] structured-logging: add timer around do_read_index git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 13/25] structured-logging: add timer around wt-status functions git
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use a SLOG timer to record the time spend in do_write_index() and
report it in the "cmd_exit" event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 read-cache.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index df5dc87..7fe66b5 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2433,7 +2433,9 @@ static int commit_locked_index(struct lock_file *lk)
 static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
 				 unsigned flags)
 {
+	int slog_timer = slog_start_timer("index", "do_write_index");
 	int ret = do_write_index(istate, lock->tempfile, 0);
+	slog_stop_timer(slog_timer);
 	if (ret)
 		return ret;
 	if (flags & COMMIT_LOCK)
@@ -2514,11 +2516,14 @@ static int clean_shared_index_files(const char *current_hex)
 static int write_shared_index(struct index_state *istate,
 			      struct tempfile **temp)
 {
+	int slog_tid = SLOG_UNDEFINED_TIMER_ID;
 	struct split_index *si = istate->split_index;
 	int ret;
 
 	move_cache_to_base_index(istate);
+	slog_tid = slog_start_timer("index", "do_write_index");
 	ret = do_write_index(si->base, *temp, 1);
+	slog_stop_timer(slog_tid);
 	if (ret)
 		return ret;
 	ret = adjust_shared_perm(get_tempfile_path(*temp));
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 13/25] structured-logging: add timer around wt-status functions
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (11 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 12/25] structured-logging: add timer around do_write_index git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 14/25] structured-logging: add timer around preload_index git
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use a SLOG timer to record the time spend in wt_status_collect_worktree(),
wt_status_collect_changes_initial(), wt_status_collect_changes_index(),
and wt_status_collect_untracked().  These are reported in the "cmd_exit"
event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 wt-status.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/wt-status.c b/wt-status.c
index d1c0514..f663a37 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -580,8 +580,11 @@ static void wt_status_collect_updated_cb(struct diff_queue_struct *q,
 
 static void wt_status_collect_changes_worktree(struct wt_status *s)
 {
+	int slog_tid;
 	struct rev_info rev;
 
+	slog_tid = slog_start_timer("status", "worktree");
+
 	init_revisions(&rev, NULL);
 	setup_revisions(0, NULL, &rev, NULL);
 	rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
@@ -600,13 +603,18 @@ static void wt_status_collect_changes_worktree(struct wt_status *s)
 	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
 	copy_pathspec(&rev.prune_data, &s->pathspec);
 	run_diff_files(&rev, 0);
+
+	slog_stop_timer(slog_tid);
 }
 
 static void wt_status_collect_changes_index(struct wt_status *s)
 {
+	int slog_tid;
 	struct rev_info rev;
 	struct setup_revision_opt opt;
 
+	slog_tid = slog_start_timer("status", "changes_index");
+
 	init_revisions(&rev, NULL);
 	memset(&opt, 0, sizeof(opt));
 	opt.def = s->is_initial ? empty_tree_oid_hex() : s->reference;
@@ -636,12 +644,17 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
 	copy_pathspec(&rev.prune_data, &s->pathspec);
 	run_diff_index(&rev, 1);
+
+	slog_stop_timer(slog_tid);
 }
 
 static void wt_status_collect_changes_initial(struct wt_status *s)
 {
+	int slog_tid;
 	int i;
 
+	slog_tid = slog_start_timer("status", "changes_initial");
+
 	for (i = 0; i < active_nr; i++) {
 		struct string_list_item *it;
 		struct wt_status_change_data *d;
@@ -672,10 +685,13 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			oidcpy(&d->oid_index, &ce->oid);
 		}
 	}
+
+	slog_stop_timer(slog_tid);
 }
 
 static void wt_status_collect_untracked(struct wt_status *s)
 {
+	int slog_tid;
 	int i;
 	struct dir_struct dir;
 	uint64_t t_begin = getnanotime();
@@ -683,6 +699,8 @@ static void wt_status_collect_untracked(struct wt_status *s)
 	if (!s->show_untracked_files)
 		return;
 
+	slog_tid = slog_start_timer("status", "untracked");
+
 	memset(&dir, 0, sizeof(dir));
 	if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES)
 		dir.flags |=
@@ -722,6 +740,8 @@ static void wt_status_collect_untracked(struct wt_status *s)
 
 	if (advice_status_u_option)
 		s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;
+
+	slog_stop_timer(slog_tid);
 }
 
 void wt_status_collect(struct wt_status *s)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 14/25] structured-logging: add timer around preload_index
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (12 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 13/25] structured-logging: add timer around wt-status functions git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 15/25] structured-logging: t0420 tests for timers git
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use a SLOG timer to record the time spend in preload_index() and
report it in the "cmd_exit" event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 preload-index.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/preload-index.c b/preload-index.c
index 4d08d44..572bb56 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -116,8 +116,14 @@ static void preload_index(struct index_state *index,
 int read_index_preload(struct index_state *index,
 		       const struct pathspec *pathspec)
 {
+	int slog_tid;
 	int retval = read_index(index);
 
+	slog_tid = slog_start_timer("index", "preload");
+
 	preload_index(index, pathspec);
+
+	slog_stop_timer(slog_tid);
+
 	return retval;
 }
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 15/25] structured-logging: t0420 tests for timers
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (13 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 14/25] structured-logging: add timer around preload_index git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 16/25] structured-logging: add aux-data facility git
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t0420-structured-logging.sh | 48 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t0420-structured-logging.sh b/t/t0420-structured-logging.sh
index a594af3..37c7e83 100755
--- a/t/t0420-structured-logging.sh
+++ b/t/t0420-structured-logging.sh
@@ -140,4 +140,52 @@ test_expect_success PERLJSON 'parse JSON for checkout command' '
 	grep "row\[2\]\.sub_command path" <parsed_exit
 '
 
+test_expect_success PERLJSON 'turn on all timers, verify some are present' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	git config --local slog.timers 1 &&
+	rm -f "$LOGFILE" &&
+
+	git status >/dev/null &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+
+	grep "row\[0\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[0\]\.argv\[1\] status" <parsed_exit &&
+	grep "row\[0\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[0\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[0\]\.command status" <parsed_exit &&
+
+	grep "row\[0\]\.timers\.index\.do_read_index\.count" <parsed_exit &&
+	grep "row\[0\]\.timers\.index\.do_read_index\.total_us" <parsed_exit &&
+
+	grep "row\[0\]\.timers\.status\.untracked\.count" <parsed_exit &&
+	grep "row\[0\]\.timers\.status\.untracked\.total_us" <parsed_exit
+'
+
+test_expect_success PERLJSON 'turn on index timers only' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	git config --local slog.timers foo,index,bar &&
+	rm -f "$LOGFILE" &&
+
+	git status >/dev/null &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+
+	grep "row\[0\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[0\]\.argv\[1\] status" <parsed_exit &&
+	grep "row\[0\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[0\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[0\]\.command status" <parsed_exit &&
+
+	grep "row\[0\]\.timers\.index\.do_read_index\.count" <parsed_exit &&
+	grep "row\[0\]\.timers\.index\.do_read_index\.total_us" <parsed_exit &&
+
+	test_expect_code 1 grep "row\[0\]\.timers\.status\.untracked\.count" <parsed_exit &&
+	test_expect_code 1 grep "row\[0\]\.timers\.status\.untracked\.total_us" <parsed_exit
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 16/25] structured-logging: add aux-data facility
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (14 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 15/25] structured-logging: t0420 tests for timers git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 17/25] structured-logging: add aux-data for index size git
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add facility to add extra data to the structured logging data allowing
arbitrary key/value pair data to be added to the "cmd_exit" event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/config.txt |   6 +++
 structured-logging.c     | 116 +++++++++++++++++++++++++++++++++++++++++++++++
 structured-logging.h     |  21 +++++++++
 3 files changed, 143 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 7817966..ca78d4c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3195,6 +3195,12 @@ slog.timers::
 	enabled.  Defaults to off.  Data for enabled timers is added to
 	the `cmd_exit` event.
 
+slog.aux::
+	(EXPERIMENTAL) May be set to a boolean value or a list of
+	comma separated tokens.  Controls which categories of SLOG
+	"aux" data are enabled.  Defaults to off.  "Aux" data is added
+	to the `cmd_exit` event.
+
 splitIndex.maxPercentChange::
 	When the split index feature is used, this specifies the
 	percent of entries the split index can contain compared to the
diff --git a/structured-logging.c b/structured-logging.c
index 215138c..584f70a 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -35,6 +35,19 @@ static struct timer_data_array my__timers;
 static void format_timers(struct json_writer *jw);
 static void free_timers(void);
 
+struct aux_data {
+	char *category;
+	struct json_writer jw;
+};
+
+struct aux_data_array {
+	struct aux_data **array;
+	size_t nr, alloc;
+};
+
+static struct aux_data_array my__aux_data;
+static void format_and_free_aux_data(struct json_writer *jw);
+
 static uint64_t my__start_time;
 static uint64_t my__exit_time;
 static int my__is_config_loaded;
@@ -62,6 +75,7 @@ struct category_filter
 
 static struct category_filter my__detail_categories;
 static struct category_filter my__timer_categories;
+static struct category_filter my__aux_categories;
 
 static void set_want_categories(struct category_filter *cf, const char *value)
 {
@@ -255,6 +269,12 @@ static void emit_exit_event(void)
 			format_timers(&jw);
 			jw_end(&jw);
 		}
+
+		if (my__aux_data.nr) {
+			jw_object_inline_begin_object(&jw, "aux");
+			format_and_free_aux_data(&jw);
+			jw_end(&jw);
+		}
 	}
 	jw_end(&jw);
 
@@ -327,6 +347,12 @@ static int cfg_timers(const char *key, const char *value)
 	return 0;
 }
 
+static int cfg_aux(const char *key, const char *value)
+{
+	set_want_categories(&my__aux_categories, value);
+	return 0;
+}
+
 int slog_default_config(const char *key, const char *value)
 {
 	const char *sub;
@@ -349,6 +375,8 @@ int slog_default_config(const char *key, const char *value)
 			return cfg_detail(key, value);
 		if (!strcmp(sub, "timers"))
 			return cfg_timers(key, value);
+		if (!strcmp(sub, "aux"))
+			return cfg_aux(key, value);
 	}
 
 	return 0;
@@ -699,4 +727,92 @@ static void free_timers(void)
 	my__timers.alloc = 0;
 }
 
+int slog_want_aux(const char *category)
+{
+	return want_category(&my__aux_categories, category);
+}
+
+static struct aux_data *find_aux_data(const char *category)
+{
+	struct aux_data *ad;
+	int k;
+
+	if (!slog_want_aux(category))
+		return NULL;
+
+	for (k = 0; k < my__aux_data.nr; k++) {
+		ad = my__aux_data.array[k];
+		if (!strcmp(category, ad->category))
+			return ad;
+	}
+
+	ad = xcalloc(1, sizeof(struct aux_data));
+	ad->category = xstrdup(category);
+
+	jw_array_begin(&ad->jw, my__is_pretty);
+	/* leave per-category object unterminated for now */
+
+	ALLOC_GROW(my__aux_data.array, my__aux_data.nr + 1, my__aux_data.alloc);
+	my__aux_data.array[my__aux_data.nr++] = ad;
+
+	return ad;
+}
+
+#define add_to_aux(c, k, v, fn)						\
+	do {								\
+		struct aux_data *ad = find_aux_data((c));		\
+		if (ad) {						\
+			jw_array_inline_begin_array(&ad->jw);		\
+			{						\
+				jw_array_string(&ad->jw, (k));		\
+				(fn)(&ad->jw, (v));			\
+			}						\
+			jw_end(&ad->jw);				\
+		}							\
+	} while (0)
+
+void slog_aux_string(const char *category, const char *key, const char *value)
+{
+	add_to_aux(category, key, value, jw_array_string);
+}
+
+void slog_aux_intmax(const char *category, const char *key, intmax_t value)
+{
+	add_to_aux(category, key, value, jw_array_intmax);
+}
+
+void slog_aux_bool(const char *category, const char *key, int value)
+{
+	add_to_aux(category, key, value, jw_array_bool);
+}
+
+void slog_aux_jw(const char *category, const char *key,
+		 const struct json_writer *value)
+{
+	add_to_aux(category, key, value, jw_array_sub_jw);
+}
+
+static void format_and_free_aux_data(struct json_writer *jw)
+{
+	int k;
+
+	for (k = 0; k < my__aux_data.nr; k++) {
+		struct aux_data *ad = my__aux_data.array[k];
+
+		/* terminate per-category form */
+		jw_end(&ad->jw);
+
+		/* insert per-category form into containing "aux" form */
+		jw_object_sub_jw(jw, ad->category, &ad->jw);
+
+		jw_release(&ad->jw);
+		free(ad->category);
+		free(ad);
+	}
+
+	FREE_AND_NULL(my__aux_data.array);
+	my__aux_data.nr = 0;
+	my__aux_data.alloc = 0;
+}
+
 #endif
diff --git a/structured-logging.h b/structured-logging.h
index a29aa6e..2272598 100644
--- a/structured-logging.h
+++ b/structured-logging.h
@@ -25,6 +25,11 @@ typedef int (*slog_fn_main_t)(int, const char **);
 #define slog_emit_detail_event(category, label, data) do { } while (0)
 #define slog_start_timer(category, name) (SLOG_UNDEFINED_TIMER_ID)
 static inline void slog_stop_timer(int tid) { };
+#define slog_want_aux(c) (0)
+#define slog_aux_string(c, k, v) do { } while (0)
+#define slog_aux_intmax(c, k, v) do { } while (0)
+#define slog_aux_bool(c, k, v) do { } while (0)
+#define slog_aux_jw(c, k, v) do { } while (0)
 
 #else
 
@@ -126,5 +131,21 @@ int slog_start_timer(const char *category, const char *name);
  */
 void slog_stop_timer(int tid);
 
+/*
+ * Add arbitrary extra key/value data to the "cmd_exit" event.
+ * These fields will appear under the "aux" object.  This is
+ * intended for "interesting" config values or repo stats, such
+ * as the size of the index.
+ *
+ * These key/value pairs are written as an array-pair rather than
+ * an object/value because the keys may be repeated.
+ */
+int slog_want_aux(const char *category);
+void slog_aux_string(const char *category, const char *key, const char *value);
+void slog_aux_intmax(const char *category, const char *key, intmax_t value);
+void slog_aux_bool(const char *category, const char *key, int value);
+void slog_aux_jw(const char *category, const char *key,
+		 const struct json_writer *value);
+
 #endif /* STRUCTURED_LOGGING */
 #endif /* STRUCTURED_LOGGING_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 17/25] structured-logging: add aux-data for index size
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (15 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 16/25] structured-logging: add aux-data facility git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 18/25] structured-logging: add aux-data for size of sparse-checkout file git
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach do_read_index() and do_write_index() to record the size of the index
in aux-data.  This will be reported in the "cmd_exit" event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 read-cache.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 7fe66b5..b6e2cfa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1916,6 +1916,8 @@ int read_index_from(struct index_state *istate, const char *path,
 	slog_stop_timer(slog_tid);
 	trace_performance_since(start, "read cache %s", path);
 
+	slog_aux_intmax("index", "cache_nr", istate->cache_nr);
+
 	split_index = istate->split_index;
 	if (!split_index || is_null_oid(&split_index->base_oid)) {
 		post_read_index_from(istate);
@@ -1937,6 +1939,8 @@ int read_index_from(struct index_state *istate, const char *path,
 		    base_oid_hex, base_path,
 		    oid_to_hex(&split_index->base->oid));
 
+	slog_aux_intmax("index", "split_index_cache_nr", split_index->base->cache_nr);
+
 	freshen_shared_index(base_path, 0);
 	merge_base_index(istate);
 	post_read_index_from(istate);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 18/25] structured-logging: add aux-data for size of sparse-checkout file
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (16 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 17/25] structured-logging: add aux-data for index size git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 19/25] structured-logging: t0420 tests for aux-data git
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach unpack_trees() to record the number of entries in the sparse-checkout
file in the aux-data.  This will be reported in the "cmd_exit" event.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 unpack-trees.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a85a02..71b1b93 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1285,8 +1285,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		char *sparse = git_pathdup("info/sparse-checkout");
 		if (add_excludes_from_file_to_list(sparse, "", 0, &el, NULL) < 0)
 			o->skip_sparse_checkout = 1;
-		else
+		else {
 			o->el = &el;
+			slog_aux_intmax("index", "sparse_checkout_count", el.nr);
+		}
 		free(sparse);
 	}
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 19/25] structured-logging: t0420 tests for aux-data
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (17 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 18/25] structured-logging: add aux-data for size of sparse-checkout file git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 20/25] structured-logging: add structured logging to remote-curl git
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t0420-structured-logging.sh | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/t/t0420-structured-logging.sh b/t/t0420-structured-logging.sh
index 37c7e83..2e06cd7 100755
--- a/t/t0420-structured-logging.sh
+++ b/t/t0420-structured-logging.sh
@@ -188,4 +188,37 @@ test_expect_success PERLJSON 'turn on index timers only' '
 	test_expect_code 1 grep "row\[0\]\.timers\.status\.untracked\.total_us" <parsed_exit
 '
 
+test_expect_success PERLJSON 'turn on aux-data, verify a few fields' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	echo "hello.t" >.git/info/sparse-checkout &&
+	git config --local core.sparsecheckout true &&
+	git config --local slog.aux foo,index,bar &&
+	rm -f "$LOGFILE" &&
+
+	git checkout HEAD &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+
+	grep "row\[0\]\.version\.slog 0" <parsed_exit &&
+	grep "row\[0\]\.argv\[1\] checkout" <parsed_exit &&
+	grep "row\[0\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[0\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[0\]\.command checkout" <parsed_exit &&
+	grep "row\[0\]\.sub_command switch_branch" <parsed_exit &&
+
+	# Expect:
+	#   row[0].aux.index[<k>][0] cache_nr
+	#   row[0].aux.index[<k>][1] 1
+	#   row[0].aux.index[<j>][0] sparse_checkout_count
+	#   row[0].aux.index[<j>][1] 1
+	#
+	# But do not assume values for <j> and <k> (in case the sorting changes
+	# or other "aux" fields are added later).
+
+	grep "row\[0\]\.aux\.index\[.*\]\[0\] cache_nr" <parsed_exit &&
+	grep "row\[0\]\.aux\.index\[.*\]\[0\] sparse_checkout_count" <parsed_exit
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 20/25] structured-logging: add structured logging to remote-curl
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (18 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 19/25] structured-logging: t0420 tests for aux-data git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 21/25] structured-logging: add detail-events for child processes git
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

remote-curl is not a builtin command and therefore, does not inherit
the common cmd_main() startup in git.c

Wrap cmd_main() with slog_cmd_main() in remote-curl to initialize
logging.

Add slog timers around push, fetch, and list verbs.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 remote-curl.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 99b0bed..ed910f8 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1322,8 +1322,9 @@ static int stateless_connect(const char *service_name)
 	return 0;
 }
 
-int cmd_main(int argc, const char **argv)
+static int real_cmd_main(int argc, const char **argv)
 {
+	int slog_tid;
 	struct strbuf buf = STRBUF_INIT;
 	int nongit;
 
@@ -1333,6 +1334,8 @@ int cmd_main(int argc, const char **argv)
 		return 1;
 	}
 
+	slog_set_command_name("remote-curl");
+
 	options.verbosity = 1;
 	options.progress = !!isatty(2);
 	options.thin = 1;
@@ -1362,14 +1365,20 @@ int cmd_main(int argc, const char **argv)
 		if (starts_with(buf.buf, "fetch ")) {
 			if (nongit)
 				die("remote-curl: fetch attempted without a local repo");
+			slog_tid = slog_start_timer("curl", "fetch");
 			parse_fetch(&buf);
+			slog_stop_timer(slog_tid);
 
 		} else if (!strcmp(buf.buf, "list") || starts_with(buf.buf, "list ")) {
 			int for_push = !!strstr(buf.buf + 4, "for-push");
+			slog_tid = slog_start_timer("curl", "list");
 			output_refs(get_refs(for_push));
+			slog_stop_timer(slog_tid);
 
 		} else if (starts_with(buf.buf, "push ")) {
+			slog_tid = slog_start_timer("curl", "push");
 			parse_push(&buf);
+			slog_stop_timer(slog_tid);
 
 		} else if (skip_prefix(buf.buf, "option ", &arg)) {
 			char *value = strchr(arg, ' ');
@@ -1411,3 +1420,8 @@ int cmd_main(int argc, const char **argv)
 
 	return 0;
 }
+
+int cmd_main(int argc, const char **argv)
+{
+	return slog_wrap_main(real_cmd_main, argc, argv);
+}
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 21/25] structured-logging: add detail-events for child processes
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (19 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 20/25] structured-logging: add structured logging to remote-curl git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 22/25] structured-logging: add child process classification git
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to emit "detail" events with category "child" before a child
process is started and after it finishes.  These events can be used to
infer time spent by git waiting for child processes to complete.

These events are controlled by the slog.detail config setting.  Set to
true or add the token "child" to it.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 run-command.c        |  14 ++++-
 run-command.h        |   1 +
 structured-logging.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 structured-logging.h |  15 +++++
 4 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/run-command.c b/run-command.c
index 84b883c..30fb4c5 100644
--- a/run-command.c
+++ b/run-command.c
@@ -710,6 +710,8 @@ int start_command(struct child_process *cmd)
 
 	fflush(NULL);
 
+	cmd->slog_child_id = slog_child_starting(cmd);
+
 #ifndef GIT_WINDOWS_NATIVE
 {
 	int notify_pipe[2];
@@ -923,6 +925,9 @@ int start_command(struct child_process *cmd)
 			close_pair(fderr);
 		else if (cmd->err)
 			close(cmd->err);
+
+		slog_child_ended(cmd->slog_child_id, cmd->pid, failed_errno);
+
 		child_process_clear(cmd);
 		errno = failed_errno;
 		return -1;
@@ -949,13 +954,20 @@ int start_command(struct child_process *cmd)
 int finish_command(struct child_process *cmd)
 {
 	int ret = wait_or_whine(cmd->pid, cmd->argv[0], 0);
+
+	slog_child_ended(cmd->slog_child_id, cmd->pid, ret);
+
 	child_process_clear(cmd);
 	return ret;
 }
 
 int finish_command_in_signal(struct child_process *cmd)
 {
-	return wait_or_whine(cmd->pid, cmd->argv[0], 1);
+	int ret = wait_or_whine(cmd->pid, cmd->argv[0], 1);
+
+	slog_child_ended(cmd->slog_child_id, cmd->pid, ret);
+
+	return ret;
 }
 
 
diff --git a/run-command.h b/run-command.h
index 3932420..89c89cf 100644
--- a/run-command.h
+++ b/run-command.h
@@ -12,6 +12,7 @@ struct child_process {
 	struct argv_array args;
 	struct argv_array env_array;
 	pid_t pid;
+	int slog_child_id;
 	/*
 	 * Using .in, .out, .err:
 	 * - Specify 0 for no redirections (child inherits stdin, stdout,
diff --git a/structured-logging.c b/structured-logging.c
index 584f70a..dbe60b7 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -4,6 +4,7 @@
 #include "json-writer.h"
 #include "sigchain.h"
 #include "argv-array.h"
+#include "run-command.h"
 
 #if !defined(STRUCTURED_LOGGING)
 /*
@@ -48,6 +49,23 @@ struct aux_data_array {
 static struct aux_data_array my__aux_data;
 static void format_and_free_aux_data(struct json_writer *jw);
 
+struct child_data {
+	uint64_t start_ns;
+	uint64_t end_ns;
+	struct json_writer jw_argv;
+	unsigned int is_running:1;
+	unsigned int is_git_cmd:1;
+	unsigned int use_shell:1;
+};
+
+struct child_data_array {
+	struct child_data **array;
+	size_t nr, alloc;
+};
+
+static struct child_data_array my__child_data;
+static void free_children(void);
+
 static uint64_t my__start_time;
 static uint64_t my__exit_time;
 static int my__is_config_loaded;
@@ -283,10 +301,11 @@ static void emit_exit_event(void)
 }
 
 static void emit_detail_event(const char *category, const char *label,
+			      uint64_t clock_ns,
 			      const struct json_writer *data)
 {
 	struct json_writer jw = JSON_WRITER_INIT;
-	uint64_t clock_us = getnanotime() / 1000;
+	uint64_t clock_us = clock_ns / 1000;
 
 	/* build "detail" event */
 	jw_object_begin(&jw, my__is_pretty);
@@ -435,6 +454,7 @@ static void do_final_steps(int in_signal)
 	jw_release(&my__errors);
 	strbuf_release(&my__session_id);
 	free_timers();
+	free_children();
 }
 
 static void slog_atexit(void)
@@ -580,7 +600,7 @@ void slog_emit_detail_event(const char *category, const char *label,
 		BUG("unterminated slog.detail data: '%s' '%s' '%s'",
 		    category, label, data->json.buf);
 
-	emit_detail_event(category, label, data);
+	emit_detail_event(category, label, getnanotime(), data);
 }
 
 int slog_start_timer(const char *category, const char *name)
@@ -815,4 +835,134 @@ static void format_and_free_aux_data(struct json_writer *jw)
 	my__aux_data.alloc = 0;
 }
 
+static struct child_data *alloc_child_data(const struct child_process *cmd)
+{
+	struct child_data *cd = xcalloc(1, sizeof(struct child_data));
+
+	cd->start_ns = getnanotime();
+	cd->is_running = 1;
+	cd->is_git_cmd = cmd->git_cmd;
+	cd->use_shell = cmd->use_shell;
+
+	jw_init(&cd->jw_argv);
+
+	jw_array_begin(&cd->jw_argv, my__is_pretty);
+	{
+		jw_array_argv(&cd->jw_argv, cmd->argv);
+	}
+	jw_end(&cd->jw_argv);
+
+	return cd;
+}
+
+static int insert_child_data(struct child_data *cd)
+{
+	int child_id = my__child_data.nr;
+
+	ALLOC_GROW(my__child_data.array, my__child_data.nr + 1,
+		   my__child_data.alloc);
+	my__child_data.array[my__child_data.nr++] = cd;
+
+	return child_id;
+}
+
+int slog_child_starting(const struct child_process *cmd)
+{
+	struct child_data *cd;
+	int child_id;
+
+	if (!slog_is_enabled())
+		return SLOG_UNDEFINED_CHILD_ID;
+
+	/*
+	 * If we have not yet written a cmd_start event (and even if
+	 * we do not emit this child_start event), force the cmd_start
+	 * event now so that it appears in the log before any events
+	 * that the child process itself emits.
+	 */
+	if (!my__wrote_start_event)
+		emit_start_event();
+
+	cd = alloc_child_data(cmd);
+	child_id = insert_child_data(cd);
+
+	/* build data portion for a "detail" event */
+	if (slog_want_detail_event("child")) {
+		struct json_writer jw_data = JSON_WRITER_INIT;
+
+		jw_object_begin(&jw_data, my__is_pretty);
+		{
+			jw_object_intmax(&jw_data, "child_id", child_id);
+			jw_object_bool(&jw_data, "git_cmd", cd->is_git_cmd);
+			jw_object_bool(&jw_data, "use_shell", cd->use_shell);
+			jw_object_sub_jw(&jw_data, "child_argv", &cd->jw_argv);
+		}
+		jw_end(&jw_data);
+
+		emit_detail_event("child", "child_starting", cd->start_ns,
+				  &jw_data);
+		jw_release(&jw_data);
+	}
+
+	return child_id;
+}
+
+void slog_child_ended(int child_id, int child_pid, int child_exit_code)
+{
+	struct child_data *cd;
+
+	if (!slog_is_enabled())
+		return;
+	if (child_id == SLOG_UNDEFINED_CHILD_ID)
+		return;
+	if (child_id >= my__child_data.nr || child_id < 0)
+		BUG("Invalid slog.child id '%d'", child_id);
+
+	cd = my__child_data.array[child_id];
+	if (!cd->is_running)
+		BUG("slog.child '%d' already stopped", child_id);
+
+	cd->end_ns = getnanotime();
+	cd->is_running = 0;
+
+	/* build data portion for a "detail" event */
+	if (slog_want_detail_event("child")) {
+		struct json_writer jw_data = JSON_WRITER_INIT;
+
+		jw_object_begin(&jw_data, my__is_pretty);
+		{
+			jw_object_intmax(&jw_data, "child_id", child_id);
+			jw_object_bool(&jw_data, "git_cmd", cd->is_git_cmd);
+			jw_object_bool(&jw_data, "use_shell", cd->use_shell);
+			jw_object_sub_jw(&jw_data, "child_argv", &cd->jw_argv);
+
+			jw_object_intmax(&jw_data, "child_pid", child_pid);
+			jw_object_intmax(&jw_data, "child_exit_code",
+					 child_exit_code);
+			jw_object_intmax(&jw_data, "child_elapsed_us",
+					 (cd->end_ns - cd->start_ns) / 1000);
+		}
+		jw_end(&jw_data);
+
+		emit_detail_event("child", "child_ended", cd->end_ns, &jw_data);
+		jw_release(&jw_data);
+	}
+}
+
+static void free_children(void)
+{
+	int k;
+
+	for (k = 0; k < my__child_data.nr; k++) {
+		struct child_data *cd = my__child_data.array[k];
+
+		jw_release(&cd->jw_argv);
+		free(cd);
+	}
+
+	FREE_AND_NULL(my__child_data.array);
+	my__child_data.nr = 0;
+	my__child_data.alloc = 0;
+}
+
 #endif
diff --git a/structured-logging.h b/structured-logging.h
index 2272598..7c98d33 100644
--- a/structured-logging.h
+++ b/structured-logging.h
@@ -2,10 +2,12 @@
 #define STRUCTURED_LOGGING_H
 
 struct json_writer;
+struct child_process;
 
 typedef int (*slog_fn_main_t)(int, const char **);
 
 #define SLOG_UNDEFINED_TIMER_ID (-1)
+#define SLOG_UNDEFINED_CHILD_ID (-1)
 
 #if !defined(STRUCTURED_LOGGING)
 /*
@@ -30,6 +32,8 @@ static inline void slog_stop_timer(int tid) { };
 #define slog_aux_intmax(c, k, v) do { } while (0)
 #define slog_aux_bool(c, k, v) do { } while (0)
 #define slog_aux_jw(c, k, v) do { } while (0)
+#define slog_child_starting(cmd) (SLOG_UNDEFINED_CHILD_ID)
+#define slog_child_ended(i, p, ec) do { } while (0)
 
 #else
 
@@ -147,5 +151,16 @@ void slog_aux_bool(const char *category, const char *key, int value);
 void slog_aux_jw(const char *category, const char *key,
 		 const struct json_writer *value);
 
+/*
+ * Emit a detail event of category "child" and label "child_starting"
+ * or "child_ending" with information about the child process.  Note
+ * that this is in addition to any events that the child process itself
+ * generates.
+ *
+ * Set "slog.detail" to true or contain "child" to get these events.
+ */
+int slog_child_starting(const struct child_process *cmd);
+void slog_child_ended(int child_id, int child_pid, int child_exit_code);
+
 #endif /* STRUCTURED_LOGGING */
 #endif /* STRUCTURED_LOGGING_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 22/25] structured-logging: add child process classification
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (20 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 21/25] structured-logging: add detail-events for child processes git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 23/25] structured-logging: t0420 tests for child process detail events git
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to classify child processes as "editor", "pager", "subprocess",
"alias", "shell", or "other".

Add the child process classification to the child detail events.

Mark child processes of class "editor" or "pager" as interactive in the
child detail event.

Add child summary to cmd_exit event grouping child process by class.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 editor.c             |   1 +
 git.c                |   2 +
 pager.c              |   1 +
 run-command.h        |   1 +
 structured-logging.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++
 sub-process.c        |   1 +
 6 files changed, 125 insertions(+)

diff --git a/editor.c b/editor.c
index 9a9b4e1..6f5ccf3 100644
--- a/editor.c
+++ b/editor.c
@@ -66,6 +66,7 @@ int launch_editor(const char *path, struct strbuf *buffer, const char *const *en
 		p.argv = args;
 		p.env = env;
 		p.use_shell = 1;
+		p.slog_child_class = "editor";
 		if (start_command(&p) < 0)
 			return error("unable to start editor '%s'", editor);
 
diff --git a/git.c b/git.c
index 024a40d..f1cb29e 100644
--- a/git.c
+++ b/git.c
@@ -328,6 +328,7 @@ static int handle_alias(int *argcp, const char ***argv)
 			commit_pager_choice();
 
 			child.use_shell = 1;
+			child.slog_child_class = "alias";
 			argv_array_push(&child.args, alias_string + 1);
 			argv_array_pushv(&child.args, (*argv) + 1);
 
@@ -651,6 +652,7 @@ static void execv_dashed_external(const char **argv)
 	cmd.clean_on_exit = 1;
 	cmd.wait_after_clean = 1;
 	cmd.silent_exec_failure = 1;
+	cmd.slog_child_class = "alias";
 
 	trace_argv_printf(cmd.args.argv, "trace: exec:");
 
diff --git a/pager.c b/pager.c
index a768797..5939077 100644
--- a/pager.c
+++ b/pager.c
@@ -100,6 +100,7 @@ void prepare_pager_args(struct child_process *pager_process, const char *pager)
 	argv_array_push(&pager_process->args, pager);
 	pager_process->use_shell = 1;
 	setup_pager_env(&pager_process->env_array);
+	pager_process->slog_child_class = "pager";
 }
 
 void setup_pager(void)
diff --git a/run-command.h b/run-command.h
index 89c89cf..8c99bd1 100644
--- a/run-command.h
+++ b/run-command.h
@@ -13,6 +13,7 @@ struct child_process {
 	struct argv_array env_array;
 	pid_t pid;
 	int slog_child_id;
+	const char *slog_child_class;
 	/*
 	 * Using .in, .out, .err:
 	 * - Specify 0 for no redirections (child inherits stdin, stdout,
diff --git a/structured-logging.c b/structured-logging.c
index dbe60b7..2571e79 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -49,13 +49,30 @@ struct aux_data_array {
 static struct aux_data_array my__aux_data;
 static void format_and_free_aux_data(struct json_writer *jw);
 
+struct child_summary_data {
+	char *child_class;
+	uint64_t total_ns;
+	int count;
+};
+
+struct child_summary_data_array {
+	struct child_summary_data **array;
+	size_t nr, alloc;
+};
+
+static struct child_summary_data_array my__child_summary_data;
+static void format_child_summary_data(struct json_writer *jw);
+static void free_child_summary_data(void);
+
 struct child_data {
 	uint64_t start_ns;
 	uint64_t end_ns;
 	struct json_writer jw_argv;
+	char *child_class;
 	unsigned int is_running:1;
 	unsigned int is_git_cmd:1;
 	unsigned int use_shell:1;
+	unsigned int is_interactive:1;
 };
 
 struct child_data_array {
@@ -293,6 +310,12 @@ static void emit_exit_event(void)
 			format_and_free_aux_data(&jw);
 			jw_end(&jw);
 		}
+
+		if (my__child_summary_data.nr) {
+			jw_object_inline_begin_object(&jw, "child_summary");
+			format_child_summary_data(&jw);
+			jw_end(&jw);
+		}
 	}
 	jw_end(&jw);
 
@@ -453,6 +476,7 @@ static void do_final_steps(int in_signal)
 	argv_array_clear(&my__argv);
 	jw_release(&my__errors);
 	strbuf_release(&my__session_id);
+	free_child_summary_data();
 	free_timers();
 	free_children();
 }
@@ -835,6 +859,85 @@ static void format_and_free_aux_data(struct json_writer *jw)
 	my__aux_data.alloc = 0;
 }
 
+static struct child_summary_data *find_child_summary_data(
+	const struct child_data *cd)
+{
+	struct child_summary_data *csd;
+	char *child_class;
+	int k;
+
+	child_class = cd->child_class;
+	if (!child_class || !*child_class) {
+		if (cd->use_shell)
+			child_class = "shell";
+		child_class = "other";
+	}
+
+	for (k = 0; k < my__child_summary_data.nr; k++) {
+		csd = my__child_summary_data.array[k];
+		if (!strcmp(child_class, csd->child_class))
+			return csd;
+	}
+
+	csd = xcalloc(1, sizeof(struct child_summary_data));
+	csd->child_class = xstrdup(child_class);
+
+	ALLOC_GROW(my__child_summary_data.array, my__child_summary_data.nr + 1,
+		   my__child_summary_data.alloc);
+	my__child_summary_data.array[my__child_summary_data.nr++] = csd;
+
+	return csd;
+}
+
+static void add_child_to_summary_data(const struct child_data *cd)
+{
+	struct child_summary_data *csd = find_child_summary_data(cd);
+
+	csd->total_ns += cd->end_ns - cd->start_ns;
+	csd->count++;
+}
+
+static void format_child_summary_data(struct json_writer *jw)
+{
+	int k;
+
+	for (k = 0; k < my__child_summary_data.nr; k++) {
+		struct child_summary_data *csd = my__child_summary_data.array[k];
+
+		jw_object_inline_begin_object(jw, csd->child_class);
+		{
+			jw_object_intmax(jw, "total_us", csd->total_ns / 1000);
+			jw_object_intmax(jw, "count", csd->count);
+		}
+		jw_end(jw);
+	}
+}
+
+static void free_child_summary_data(void)
+{
+	int k;
+
+	for (k = 0; k < my__child_summary_data.nr; k++) {
+		struct child_summary_data *csd = my__child_summary_data.array[k];
+
+		free(csd->child_class);
+		free(csd);
+	}
+
+	free(my__child_summary_data.array);
+}
+
+static unsigned int is_interactive(const char *child_class)
+{
+	if (child_class && *child_class) {
+		if (!strcmp(child_class, "editor"))
+			return 1;
+		if (!strcmp(child_class, "pager"))
+			return 1;
+	}
+	return 0;
+}
+
 static struct child_data *alloc_child_data(const struct child_process *cmd)
 {
 	struct child_data *cd = xcalloc(1, sizeof(struct child_data));
@@ -843,6 +946,9 @@ static struct child_data *alloc_child_data(const struct child_process *cmd)
 	cd->is_running = 1;
 	cd->is_git_cmd = cmd->git_cmd;
 	cd->use_shell = cmd->use_shell;
+	cd->is_interactive = is_interactive(cmd->slog_child_class);
+	if (cmd->slog_child_class && *cmd->slog_child_class)
+		cd->child_class = xstrdup(cmd->slog_child_class);
 
 	jw_init(&cd->jw_argv);
 
@@ -895,6 +1001,11 @@ int slog_child_starting(const struct child_process *cmd)
 			jw_object_intmax(&jw_data, "child_id", child_id);
 			jw_object_bool(&jw_data, "git_cmd", cd->is_git_cmd);
 			jw_object_bool(&jw_data, "use_shell", cd->use_shell);
+			jw_object_bool(&jw_data, "is_interactive",
+				       cd->is_interactive);
+			if (cd->child_class)
+				jw_object_string(&jw_data, "child_class",
+						 cd->child_class);
 			jw_object_sub_jw(&jw_data, "child_argv", &cd->jw_argv);
 		}
 		jw_end(&jw_data);
@@ -925,6 +1036,8 @@ void slog_child_ended(int child_id, int child_pid, int child_exit_code)
 	cd->end_ns = getnanotime();
 	cd->is_running = 0;
 
+	add_child_to_summary_data(cd);
+
 	/* build data portion for a "detail" event */
 	if (slog_want_detail_event("child")) {
 		struct json_writer jw_data = JSON_WRITER_INIT;
@@ -934,6 +1047,11 @@ void slog_child_ended(int child_id, int child_pid, int child_exit_code)
 			jw_object_intmax(&jw_data, "child_id", child_id);
 			jw_object_bool(&jw_data, "git_cmd", cd->is_git_cmd);
 			jw_object_bool(&jw_data, "use_shell", cd->use_shell);
+			jw_object_bool(&jw_data, "is_interactive",
+				       cd->is_interactive);
+			if (cd->child_class)
+				jw_object_string(&jw_data, "child_class",
+						 cd->child_class);
 			jw_object_sub_jw(&jw_data, "child_argv", &cd->jw_argv);
 
 			jw_object_intmax(&jw_data, "child_pid", child_pid);
@@ -957,6 +1075,7 @@ static void free_children(void)
 		struct child_data *cd = my__child_data.array[k];
 
 		jw_release(&cd->jw_argv);
+		free(cd->child_class);
 		free(cd);
 	}
 
diff --git a/sub-process.c b/sub-process.c
index 8d2a170..93f7a52 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -88,6 +88,7 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
 	process->out = -1;
 	process->clean_on_exit = 1;
 	process->clean_on_exit_handler = subprocess_exit_handler;
+	process->slog_child_class = "subprocess";
 
 	err = start_command(process);
 	if (err) {
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 23/25] structured-logging: t0420 tests for child process detail events
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (21 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 22/25] structured-logging: add child process classification git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 24/25] structured-logging: t0420 tests for interacitve child_summary git
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t0420-structured-logging.sh | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/t/t0420-structured-logging.sh b/t/t0420-structured-logging.sh
index 2e06cd7..4ac404d 100755
--- a/t/t0420-structured-logging.sh
+++ b/t/t0420-structured-logging.sh
@@ -26,6 +26,9 @@ test_expect_success 'setup' '
 	cat >key_exit_code_129 <<-\EOF &&
 	"exit_code":129
 	EOF
+	cat >key_detail <<-\EOF &&
+	"event":"detail"
+	EOF
 	git config --local slog.pretty false &&
 	git config --local slog.path "$LOGFILE"
 '
@@ -221,4 +224,40 @@ test_expect_success PERLJSON 'turn on aux-data, verify a few fields' '
 	grep "row\[0\]\.aux\.index\[.*\]\[0\] sparse_checkout_count" <parsed_exit
 '
 
+test_expect_success PERLJSON 'verify child start/end events during clone' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	git config --local slog.aux false &&
+	git config --local slog.detail false &&
+	git config --local slog.timers false &&
+	rm -f "$LOGFILE" &&
+
+	# Clone seems to read the config after it switches to the target repo
+	# rather than the source repo, so we have to explicitly set the config
+	# settings on the command line.
+	git -c slog.path="$LOGFILE" -c slog.detail=true clone . ./clone1 &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+	grep -f key_detail "$LOGFILE" >event_detail &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_detail >parsed_detail &&
+
+	grep "row\[0\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[0\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[0\]\.command upload-pack" <parsed_exit &&
+
+	grep "row\[1\]\.event cmd_exit" <parsed_exit &&
+	grep "row\[1\]\.result\.exit_code 0" <parsed_exit &&
+	grep "row\[1\]\.command clone" <parsed_exit &&
+
+	grep "row\[0\]\.detail\.label child_starting" <parsed_detail &&
+	grep "row\[0\]\.detail\.data\.child_id 0" <parsed_detail &&
+	grep "row\[0\]\.detail\.data\.child_argv\[0\] git-upload-pack" <parsed_detail &&
+
+	grep "row\[1\]\.detail\.label child_ended" <parsed_detail &&
+	grep "row\[1\]\.detail\.data\.child_id 0" <parsed_detail &&
+	grep "row\[1\]\.detail\.data\.child_argv\[0\] git-upload-pack" <parsed_detail &&
+	grep "row\[1\]\.detail\.data\.child_exit_code 0" <parsed_detail
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 24/25] structured-logging: t0420 tests for interacitve child_summary
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (22 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 23/25] structured-logging: t0420 tests for child process detail events git
@ 2018-07-13 16:56 ` git
  2018-07-13 16:56 ` [PATCH v1 25/25] structured-logging: add config data facility git
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Test running a command with a fake pager and verify that a child_summary
is generated.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t0420-structured-logging.sh | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/t/t0420-structured-logging.sh b/t/t0420-structured-logging.sh
index 4ac404d..69f811a 100755
--- a/t/t0420-structured-logging.sh
+++ b/t/t0420-structured-logging.sh
@@ -260,4 +260,34 @@ test_expect_success PERLJSON 'verify child start/end events during clone' '
 	grep "row\[1\]\.detail\.data\.child_exit_code 0" <parsed_detail
 '
 
+. "$TEST_DIRECTORY"/lib-pager.sh
+. "$TEST_DIRECTORY"/lib-terminal.sh
+
+test_expect_success 'setup fake pager to test interactive' '
+	test_when_finished "rm \"$LOGFILE\" " &&
+	sane_unset GIT_PAGER GIT_PAGER_IN_USE &&
+	test_unconfig core.pager &&
+
+	PAGER="cat >paginated.out" &&
+	export PAGER &&
+
+	test_commit initial
+'
+
+test_expect_success TTY 'verify fake pager detected and process marked interactive' '
+	test_when_finished "rm \"$LOGFILE\" event_exit" &&
+	rm -f paginated.out &&
+	rm -f "$LOGFILE" &&
+
+	test_terminal git log &&
+	test -e paginated.out &&
+
+	grep -f key_cmd_exit "$LOGFILE" >event_exit &&
+
+	perl "$TEST_DIRECTORY"/t0420/parse_json.perl <event_exit >parsed_exit &&
+
+	grep "row\[0\]\.child_summary\.pager\.count 1" <parsed_exit
+'
+
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v1 25/25] structured-logging: add config data facility
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (23 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 24/25] structured-logging: t0420 tests for interacitve child_summary git
@ 2018-07-13 16:56 ` git
  2018-07-13 18:51 ` [PATCH v1 00/25] RFC: structured logging David Lang
  2018-08-28 17:38 ` Junio C Hamano
  26 siblings, 0 replies; 38+ messages in thread
From: git @ 2018-07-13 16:56 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add "config" section to "cmd_exit" event to record important
configuration settings in the log.

Add the value of "slog.detail", "slog.timers", and "slog.aux" config
settings to the log.  These values control the filtering of the log.
Knowing the filter settings can help post-processors reason about
the contents of the log.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 structured-logging.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++
 structured-logging.h |  13 +++++
 2 files changed, 145 insertions(+)

diff --git a/structured-logging.c b/structured-logging.c
index 2571e79..0e3f79e 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -83,6 +83,20 @@ struct child_data_array {
 static struct child_data_array my__child_data;
 static void free_children(void);
 
+struct config_data {
+	char *group;
+	struct json_writer jw;
+};
+
+struct config_data_array {
+	struct config_data **array;
+	size_t nr, alloc;
+};
+
+static struct config_data_array my__config_data;
+static void format_config_data(struct json_writer *jw);
+static void free_config_data(void);
+
 static uint64_t my__start_time;
 static uint64_t my__exit_time;
 static int my__is_config_loaded;
@@ -132,6 +146,15 @@ static int want_category(const struct category_filter *cf, const char *category)
 	return !!strstr(cf->categories, category);
 }
 
+static void set_config_data_from_category(const struct category_filter *cf,
+					  const char *key)
+{
+	if (cf->want == 0 || cf->want == 1)
+		slog_set_config_data_intmax(key, cf->want);
+	else
+		slog_set_config_data_string(key, cf->categories);
+}
+
 /*
  * Compute a new session id for the current process.  Build string
  * with the start time and PID of the current process and append
@@ -249,6 +272,18 @@ static void emit_exit_event(void)
 	struct json_writer jw = JSON_WRITER_INIT;
 	uint64_t atexit_time = getnanotime() / 1000;
 
+	/*
+	 * Copy important (and non-obvious) config settings into the
+	 * "config" section of the "cmd_exit" event.  The values of
+	 * "slog.detail", "slog.timers", and "slog.aux" are used in
+	 * category want filtering, so post-processors should know the
+	 * filter settings so that they can tell if an event is missing
+	 * because of filtering or an error.
+	 */
+	set_config_data_from_category(&my__detail_categories, "slog.detail");
+	set_config_data_from_category(&my__timer_categories, "slog.timers");
+	set_config_data_from_category(&my__aux_categories, "slog.aux");
+
 	/* close unterminated forms */
 	if (my__errors.json.len)
 		jw_end(&my__errors);
@@ -299,6 +334,12 @@ static void emit_exit_event(void)
 		}
 		jw_end(&jw);
 
+		if (my__config_data.nr) {
+			jw_object_inline_begin_object(&jw, "config");
+			format_config_data(&jw);
+			jw_end(&jw);
+		}
+
 		if (my__timers.nr) {
 			jw_object_inline_begin_object(&jw, "timers");
 			format_timers(&jw);
@@ -479,6 +520,7 @@ static void do_final_steps(int in_signal)
 	free_child_summary_data();
 	free_timers();
 	free_children();
+	free_config_data();
 }
 
 static void slog_atexit(void)
@@ -1084,4 +1126,94 @@ static void free_children(void)
 	my__child_data.alloc = 0;
 }
 
+/*
+ * Split <key> into <group>.<sub_key> (for example "slog.path" into "slog" and "path")
+ * Find or insert <group> in config_data_array[].
+ *
+ * Return config_data_arary[<group>].
+ */
+static struct config_data *find_config_data(const char *key, const char **sub_key)
+{
+	struct config_data *cd;
+	char *dot;
+	size_t group_len;
+	int k;
+
+	dot = strchr(key, '.');
+	if (!dot)
+		return NULL;
+
+	*sub_key = dot + 1;
+
+	group_len = dot - key;
+
+	for (k = 0; k < my__config_data.nr; k++) {
+		cd = my__config_data.array[k];
+		if (!strncmp(key, cd->group, group_len))
+			return cd;
+	}
+
+	cd = xcalloc(1, sizeof(struct config_data));
+	cd->group = xstrndup(key, group_len);
+
+	jw_object_begin(&cd->jw, my__is_pretty);
+	/* leave per-group object unterminated for now */
+
+	ALLOC_GROW(my__config_data.array, my__config_data.nr + 1,
+		   my__config_data.alloc);
+	my__config_data.array[my__config_data.nr++] = cd;
+
+	return cd;
+}
+
+void slog_set_config_data_string(const char *key, const char *value)
+{
+	const char *sub_key;
+	struct config_data *cd = find_config_data(key, &sub_key);
+
+	if (cd)
+		jw_object_string(&cd->jw, sub_key, value);
+}
+
+void slog_set_config_data_intmax(const char *key, intmax_t value)
+{
+	const char *sub_key;
+	struct config_data *cd = find_config_data(key, &sub_key);
+
+	if (cd)
+		jw_object_intmax(&cd->jw, sub_key, value);
+}
+
+static void format_config_data(struct json_writer *jw)
+{
+	int k;
+
+	for (k = 0; k < my__config_data.nr; k++) {
+		struct config_data *cd = my__config_data.array[k];
+
+		/* termminate per-group form */
+		jw_end(&cd->jw);
+
+		/* insert per-category form into containing "config" form */
+		jw_object_sub_jw(jw, cd->group, &cd->jw);
+	}
+}
+
+static void free_config_data(void)
+{
+	int k;
+
+	for (k = 0; k < my__config_data.nr; k++) {
+		struct config_data *cd = my__config_data.array[k];
+
+		jw_release(&cd->jw);
+		free(cd->group);
+		free(cd);
+	}
+
+	FREE_AND_NULL(my__config_data.array);
+	my__config_data.nr = 0;
+	my__config_data.alloc = 0;
+}
+
 #endif
diff --git a/structured-logging.h b/structured-logging.h
index 7c98d33..2c90267 100644
--- a/structured-logging.h
+++ b/structured-logging.h
@@ -34,6 +34,8 @@ static inline void slog_stop_timer(int tid) { };
 #define slog_aux_jw(c, k, v) do { } while (0)
 #define slog_child_starting(cmd) (SLOG_UNDEFINED_CHILD_ID)
 #define slog_child_ended(i, p, ec) do { } while (0)
+#define slog_set_config_data_string(k, v) do { } while (0)
+#define slog_set_config_data_intmax(k, v) do { } while (0)
 
 #else
 
@@ -162,5 +164,16 @@ void slog_aux_jw(const char *category, const char *key,
 int slog_child_starting(const struct child_process *cmd);
 void slog_child_ended(int child_id, int child_pid, int child_exit_code);
 
+/*
+ * Add an important config key/value pair to the "cmd_event".  Keys
+ * are assumed to be of the form <group>.<name>, such as "slog.path".
+ * The pair will appear under the "config" object in the resulting JSON
+ * as "config.<group>.<name>:<value>".
+ *
+ * This should only be used for important config settings.
+ */
+void slog_set_config_data_string(const char *key, const char *value);
+void slog_set_config_data_intmax(const char *key, intmax_t value);
+
 #endif /* STRUCTURED_LOGGING */
 #endif /* STRUCTURED_LOGGING_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 00/25] RFC: structured logging
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (24 preceding siblings ...)
  2018-07-13 16:56 ` [PATCH v1 25/25] structured-logging: add config data facility git
@ 2018-07-13 18:51 ` David Lang
  2018-07-16 13:29   ` Jeff Hostetler
  2018-08-28 17:38 ` Junio C Hamano
  26 siblings, 1 reply; 38+ messages in thread
From: David Lang @ 2018-07-13 18:51 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

Please make an option for git to write these logs to syslog, not just a local 
file. Every modern syslog daemon has lots of tools to be able to deal with json 
messages well.

David Lang

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 01/25] structured-logging: design document
  2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
@ 2018-07-14  8:34   ` Simon Ruderich
  2018-08-03 15:26   ` Ben Peart
  2018-08-21  4:47   ` Jonathan Nieder
  2 siblings, 0 replies; 38+ messages in thread
From: Simon Ruderich @ 2018-07-14  8:34 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

On Fri, Jul 13, 2018 at 04:55:57PM +0000, git@jeffhostetler.com wrote:
> diff --git a/Documentation/technical/structured-logging.txt b/Documentation/technical/structured-logging.txt
> new file mode 100644
> index 0000000..794c614
> --- /dev/null
> +++ b/Documentation/technical/structured-logging.txt
> @@ -0,0 +1,816 @@
> [snip]
>
> +"event": "cmd_start"
> +-------------------
> +
> +The "cmd_start" event is emitted when git starts when cmd_main() is
> +called.  In addition to the F1 fields, it contains the following
> +fields (F2):
> +
> +    "argv"        : <array-of-command-line-arguments>
> +
> +<argv> is an array of the original command line arguments given to the
> +    command (before git.c has a chance to remove the global options
> +    before the verb.

Missing closing parentheses.

> [snip]
>
> +<slog_{detail,timers,aux}> are the values of the corresponding
> +    "slog.{detail,timers,aux}" config setting.  Since these values
> +    control optional SLOG features and filtering, these are present
> +    to help post-processors know if an expected event did not happen
> +    or was simply filtered out.  (Described later)

Please write out "slog_detail, "slog_timers", etc. Using the
abbreviated forms makes searching a pain.

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 00/25] RFC: structured logging
  2018-07-13 18:51 ` [PATCH v1 00/25] RFC: structured logging David Lang
@ 2018-07-16 13:29   ` Jeff Hostetler
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Hostetler @ 2018-07-16 13:29 UTC (permalink / raw)
  To: David Lang; +Cc: git, gitster, peff, Jeff Hostetler



On 7/13/2018 2:51 PM, David Lang wrote:
> Please make an option for git to write these logs to syslog, not just a 
> local file. Every modern syslog daemon has lots of tools to be able to 
> deal with json messages well.
> 
> David Lang

That is certainly possible and we can easily add it in a later draft,
but for now I'd like to stay platform-neutral and just log events to a
file.

My main goal right now is to get consensus on the basic structured 
logging framework -- the shape of the SLOG API, the event message
format, and etc.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 03/25] structured-logging: add structured logging framework
  2018-07-13 16:55 ` [PATCH v1 03/25] structured-logging: add structured logging framework git
@ 2018-07-26  9:09   ` SZEDER Gábor
  2018-07-27 12:45     ` Jeff Hostetler
  2018-08-21  5:05   ` Jonathan Nieder
  1 sibling, 1 reply; 38+ messages in thread
From: SZEDER Gábor @ 2018-07-26  9:09 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, git, gitster, peff, Jeff Hostetler


> +void slog_set_command_name(const char *command_name)
> +{
> +	/*
> +	 * Capture the command name even if logging is not enabled
> +	 * because we don't know if the config has been loaded yet by
> +	 * the cmd_<command>() and/or it may be too early to force a
> +	 * lazy load.
> +	 */
> +	if (my__command_name)
> +		free(my__command_name);
> +	my__command_name = xstrdup(command_name);
> +}
> +
> +void slog_set_sub_command_name(const char *sub_command_name)
> +{
> +	/*
> +	 * Capture the sub-command name even if logging is not enabled
> +	 * because we don't know if the config has been loaded yet by
> +	 * the cmd_<command>() and/or it may be too early to force a
> +	 * lazy load.
> +	 */
> +	if (my__sub_command_name)
> +		free(my__sub_command_name);

Please drop the condition in these two functions; free() handles NULL
arguments just fine.


(Sidenote: what's the deal with these 'my__' prefixes anyway?)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 03/25] structured-logging: add structured logging framework
  2018-07-26  9:09   ` SZEDER Gábor
@ 2018-07-27 12:45     ` Jeff Hostetler
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Hostetler @ 2018-07-27 12:45 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, gitster, peff, Jeff Hostetler



On 7/26/2018 5:09 AM, SZEDER Gábor wrote:
> 
>> +void slog_set_command_name(const char *command_name)
>> +{
>> +	/*
>> +	 * Capture the command name even if logging is not enabled
>> +	 * because we don't know if the config has been loaded yet by
>> +	 * the cmd_<command>() and/or it may be too early to force a
>> +	 * lazy load.
>> +	 */
>> +	if (my__command_name)
>> +		free(my__command_name);
>> +	my__command_name = xstrdup(command_name);
>> +}
>> +
>> +void slog_set_sub_command_name(const char *sub_command_name)
>> +{
>> +	/*
>> +	 * Capture the sub-command name even if logging is not enabled
>> +	 * because we don't know if the config has been loaded yet by
>> +	 * the cmd_<command>() and/or it may be too early to force a
>> +	 * lazy load.
>> +	 */
>> +	if (my__sub_command_name)
>> +		free(my__sub_command_name);
> 
> Please drop the condition in these two functions; free() handles NULL
> arguments just fine.

sure.

> 
> (Sidenote: what's the deal with these 'my__' prefixes anyway?)
> 

simply a way to identify file-scope variables and distinguish
them from local variables.

Jeff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 01/25] structured-logging: design document
  2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
  2018-07-14  8:34   ` Simon Ruderich
@ 2018-08-03 15:26   ` Ben Peart
  2018-08-09 14:30     ` Jeff Hostetler
  2018-08-21  4:47   ` Jonathan Nieder
  2 siblings, 1 reply; 38+ messages in thread
From: Ben Peart @ 2018-08-03 15:26 UTC (permalink / raw)
  To: git, git; +Cc: gitster, peff, Jeff Hostetler



On 7/13/2018 12:55 PM, git@jeffhostetler.com wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>   Documentation/technical/structured-logging.txt | 816 +++++++++++++++++++++++++
>   1 file changed, 816 insertions(+)
>   create mode 100644 Documentation/technical/structured-logging.txt
> 
> diff --git a/Documentation/technical/structured-logging.txt b/Documentation/technical/structured-logging.txt
> new file mode 100644
> index 0000000..794c614
> --- /dev/null
> +++ b/Documentation/technical/structured-logging.txt
> @@ -0,0 +1,816 @@
> +Structured Logging
> +==================
> +
> +Structured Logging (SLOG) is an optional feature to allow Git to
> +generate structured log data for executed commands.  This includes
> +command line arguments, command run times, error codes and messages,
> +child process information, time spent in various critical functions,
> +and repository data-shape information.  Data is written to a target
> +log file in JSON[1,2,3] format.
> +
> +SLOG is disabled by default.  Several steps are required to enable it:
> +
> +1. Add the compile-time flag "STRUCTURED_LOGGING=1" when building git
> +   to include the SLOG routines in the git executable.
> +

Is the intent to remove this compile-time flag before this is merged? 
With it off by default in builds, the audience for this is limited to 
those who build their own/custom versions of git. I can see other 
organizations wanting to use this that don't have a custom fork of git 
they build and install on their users machines.

Like the GIT_TRACE mechanism today, I think this should be compiled in 
but turned off via the default settings by default.

> +2. Set "slog.*" config settings[5] to enable SLOG in your repo.
> +
> +
> +Motivation
> +==========
> +
> +Git users may be faced with scenarios that are surprisingly slow or
> +produce unexpected results.  And Git developers may have difficulty
> +reproducing these experiences.  Structured logging allows users to
> +provide developers with additional usage, performance and error data
> +that can help diagnose and debug issues.
> +
> +Many Git hosting providers and users with many developers have bespoke
> +efforts to help troubleshoot problems; for example, command wrappers,
> +custom pre- and post-command hooks, and custom instrumentation of Git
> +code.  These are inefficient and/or difficult to maintain.  The goal
> +of SLOG is to provide this data as efficiently as possible.
> +
> +And having structured rather than free format log data, will help
> +developers with their analysis.
> +
> +
> +Background (Git Merge 2018 Barcelona)
> +=====================================
> +
> +Performance and/or error logging was discussed during the contributor's
> +summit in Barcelona.  Here are the relevant notes from the meeting
> +minutes[6].
> +
> +> Performance misc (Ævar)
> +> -----------------------
> +> [...]
> +>  - central error reporting for git
> +>    - `git status` logging
> +>    - git config that collects data, pushes to known endpoint with `git push`
> +>    - pre_command and post_command hooks, for logs
> +>    - `gvfs diagnose` that looks at packfiles, etc
> +>    - detect BSODs, etc
> +>    - Dropbox writes out json with index properties and command-line
> +>        information for status/fetch/push, fork/execs external tool to upload
> +>    - windows trace facility; would be nice to have cross-platform
> +>    - would hosting providers care?
> +>    - zipfile of logs to give when debugging
> +>    - sanitizing data is harder
> +>    - more in a company setting
> +>    - fileshare to upload zipfile
> +>    - most of the errors are proxy when they shouldn't, wrong proxy, proxy
> +>        specific to particular URL; so upload endpoint wouldn't work
> +>    - GIT_TRACE is supposed to be that (for proxy)
> +>    - but we need more trace variables
> +>    - series to make tracing cheaper
> +>    - except that curl selects the proxy
> +>    - trace should have an API, so it can call an executable
> +>    - dump to .git/traces/... and everything else happens externally
> +>    - tools like visual studio can't set GIT_TRACE, so
> +>    - sourcetree has seen user environments where commands just take forever
> +>    - third-party tools like perf/strace - could we be better leveraging those?
> +>    - distribute turn-key solution to handout to collect more data?
> +

While it makes sense to have clear goals in the design document, the 
motivation and background sections feel somehow out of place.  I'd 
recommend you clearly articulate the design goals and drop the 
background data that led you to the goals.

> +
> +A Quick Example
> +===============
> +
> +Note: JSON pretty-printing is enabled in all of the examples shown in
> +this document.  When pretty-printing is turned off, each event is
> +written on a single line.  Pretty-printing is intended for debugging.
> +It should be turned off in production to make post-processing easier.
> + > +    $ git config slog.pretty <bool>
> +

nit - I'd move this "Note:" section to the end of your "A Quick 
Example." While it's good to understand about pretty printing, its not 
the first or most important thing.

> +Here is a quick example showing SLOG data for "git status".  This
> +example has all optional features turned off.  It contains 2 events.
> +The first is generated when the command started and the second when it
> +ended.
> +
> +{
> +  "event": "cmd_start",
> +  "clock_us": 1530273550667800,
> +  "pid": 107270,
> +  "sid": "1530273550667800-107270",
> +  "command": "status",
> +  "argv": [
> +    "./git",
> +    "status"
> +  ]
> +}
> +{
> +  "event": "cmd_exit",
> +  "clock_us": 1530273550680460,
> +  "pid": 107270,
> +  "sid": "1530273550667800-107270",
> +  "command": "status",
> +  "argv": [
> +    "./git",
> +    "status"
> +  ],
> +  "result": {
> +    "exit_code": 0,
> +    "elapsed_core_us": 12573,
> +    "elapsed_total_us": 12660
> +  },
> +  "version": {
> +    "git": "2.18.0.rc0.83.gde7fb7c",
> +    "slog": 0
> +  },
> +  "config": {
> +    "slog": {
> +      "detail": 0,
> +      "timers": 0,
> +      "aux": 0
> +    }
> +  }
> +}
> +
> +Both events fields describing the event name, the event time (in

Maybe "Both events have fields describing the event name..."

> +microseconds since the epoch), the OS process-id, a unique session-id
> +(described later), the normalized command name, and a vector of the
> +original command line arguments.
> +
> +The "cmd_exit" event additionally contains information about how the
> +process exited, the elapsed time, the version of git and the SLOG
> +format, and important SLOG configuration settings.
> +

At first glance, the SLOG configuration settings seem out of place but 
I'll read on to see why they are needed with every cmd_exit event...

> +The fields in the "cmd_start" event are replicated in the "cmd_exit"
> +event.  This allows log file post-processors to operate in 2 modes:
> +
> +1. Just look at the "cmd_exit" events.  This us useful if you just
> +   want to analyze the command summary data.
> +
> +2. Look at the "cmd_start" and "cmd_exit" events as bracketing a time
> +   span and examine the detailed activity between.  For example, SLOG
> +   can optionally generate "detail" events when spawning child
> +   processes and those processes may themselves generate "cmd_start"
> +   and "cmd_exit" events.  The (top-level) "cmd_start" event serves as
> +   the starting bracket of all of that activity.
> +

I'm assuming the sid is what will enable correlating the child process 
events with the outer command?

> +
> +Target Log File
> +===============
> +
> +SLOG writes events to a log file.  File logging works much like
> +GIT_TRACE where events are appended to a file on disk.
> +
> +Logging is enabled if the config variable "slog.path" is set to an
> +absolute pathname.
> +
> +As with GIT_TRACE, this file is local and private to the user's
> +system.  Log file management and rotation is beyond the scope of the
> +SLOG effort.
> +
> +Similarly, if a user wants to provide this data to a developer, they
> +must explicitly make these log files available; SLOG does not
> +broadcast any of this information.  It is up to the users of this
> +system to decide if any sensitive information should be sanitized, and
> +how to export the logs.
> +

It's good to clarify this.

> +
> +Comparison with GIT_TRACE
> +=========================
> +
> +SLOG is very similar to the existing GIT_TRACE[4] API because both
> +write event messages at various points during a command.  However,
> +there are some fundamental differences that warrant it being
> +considered a separate feature rather than just another
> +GIT_TRACE_<key>:
> +

But it does make me wonder if we need to keep both systems.  If there 
are two, as a developer I need to know which I should use.  What is the 
criteria I should use to decide between adding GIT_TRACE and SLOG 
tracing?  It seems like it would be best if we could converge on a 
single tracing model.

> +1. GIT_TRACE events are unrelated, line-by-line logging.  SLOG has
> +   line-by-line events that show command progress and can serve as
> +   structured debug messages.  SLOG also supports accumulating summary
> +   data (such as timers) that are automatically added to the final
> +   `cmd_exit` event.
> +
> +2. GIT_TRACE events are unstructured free format printf-style messages
> +   which makes post-processing difficult.  SLOG events are written in
> +   JSON and can be easily parsed using Perl, Python, and other tools.
> +
> +3. SLOG uses a well-defined API to build SLOG events containing
> +   well-defined fields to make post-command analysis easier.
> +
> +4. It should be easier to filter/redact sensitive information from
> +   SLOG data than from free form data.
> +
> +5. GIT_TRACE events are controlled by one or more global environment
> +   variables which makes it awkward to selectively log some repos and
> +   not others.  SLOG events are controlled by a few configuration
> +   settings[5].  Users (or system administrators) can configure
> +   logging using repo-local or global config settings.
> +
> +6. GIT_TRACE events do not identify the git process.  This makes it
> +   difficult to associate all of events from a particular command.
> +   Each SLOG event contains a session id to allow all events for a
> +   command to be identified.
> +
> +7. Some git commands spawn child git commands.  GIT_TRACE has no
> +   mechanism to associate events from a child process with the parent
> +   process.  SLOG session ids allow child/parent relationships to be
> +   tracked (even if there is an intermediate /bin/sh process between
> +   them).
> +
> +8. GIT_TRACE supports logging to a file or stderr.  SLOG only logs to
> +   a file.
> +
> +9. Smashing SLOG into GIT_TRACE doesn't feel like a good fit.  The 2
> +   APIs share nothing other than the concept that they write logging
> +   data.
> +

Is SLOG a superset of GIT_TRACE? If not, could it be?  While SLOG can't 
be smashed into GIT_TRACE, can the functionality of GIT_TRACE be 
subsumed by SLOG so that we can converge on a single tracing system?

I'd like to see a design goal for this effort to be that we come up with 
a single tracing mechanism that meets _all_ the requirements (including 
those currently met by GIT_TRACE).

> +
> +[1] http://json.org/
> +[2] http://www.ietf.org/rfc/rfc7159.txt
> +[3] See UTF-8 limitations described in json-writer.h
> +[4] Documentation/technical/api-trace.txt
> +[5] See "slog.*" in Documentation/config.txt
> +[6] https://public-inbox.org/git/20180313004940.GG61720@google.com/t/
> +
> +
> +SLOG Format (V0)
> +================
> +
> +SLOG writes a series of events to the log target.  Each event is a
> +self-describing JSON object.
> +
> +    <event> LF
> +    <event> LF
> +    <event> LF
> +    ...
> +
> +Each event record in the log file is an independent and complete JSON
> +object.  JSON parsers should process the file line-by-line rather than
> +trying to parse the entire file into a single object.
> +
> +    Note: It may be difficult for parsers to find record boundaries if
> +    pretty-printing is enabled, so it recommended that pretty-printing
> +    only be enabled for interactive debugging and analysis.
> +
> +Every <event> contains the following fields (F1):
> +
> +    "event"       : <event_name>
> +    "clock_us"    : <event_time>
> +    "pid"         : <os_pid>
> +    "sid"         : <session_id>
> +
> +    "command"     : <command_name>
> +    "sub_command" : <sub_command_name> (optional)
> +
> +<event_name> is one of "cmd_start", "cmd_end", or "detail".
> +
> +<event_time> is the time of the event in microseconds since the epoch.
> +
> +<os_pid> is the process-id (from getpid()).
> +
> +<session_id> is a session-id.  (Described later)
> +
> +<command_name> is a (possibly normalized) command name.  This is
> +    usually taken from the cmd_struct[] table after git parses the
> +    command line and calls the appropriate cmd_<name>() function.
> +    Having it in a top-level field saves post-processors from having
> +    to re-parse the command line to discover it.
> +
> +<sub_command_name> further qualifies the command.  This field is
> +    present for common commands that have multiple command modes.  For
> +    example, checkout can either change branches and do a full
> +    checkout or it can checkout (refresh) an individual file.  A
> +    post-processor wanting to compute percentiles for the time spent
> +    by branch-changing checkouts could easily filter out the
> +    individual file checkouts (and without having to re-parse the
> +    command line).
> +
> +    The set of sub_command values are command-specific and are not
> +    listed here.

The sub-command definition seems a little squishy but I guess since you
have the entire set of command line arguments, it's really just a hint
as you could always drop back to looking at the arguments yourself.

> +
> +"event": "cmd_start"
> +-------------------
> +
> +The "cmd_start" event is emitted when git starts when cmd_main() is
> +called.  In addition to the F1 fields, it contains the following
> +fields (F2):
> +
> +    "argv"        : <array-of-command-line-arguments>
> +
> +<argv> is an array of the original command line arguments given to the
> +    command (before git.c has a chance to remove the global options
> +    before the verb.
> +
> +
> +"event": "cmd_exit"
> +-------------------
> +
> +The "cmd_exit" event is emitted immediately before git exits (during
> +an atexit() routine).  It contains the F1 and F2 fields as described
> +above.  It also contains the the following fields (F3):
> +
> +    "result.exit_code"        : <exit_code>
> +    "result.errors"           : <arrary_of_error_messages> (optional)
> +    "result.elapsed_core_us"  : <elapsed_time_to_exit>
> +    "result.elapsed_total_us" : <elapsed_time_to_atexit>
> +    "result.signal"           : <signal_value> (optional)
> +
> +    "verion.git"              : <git_version>
> +    "version.slog"            : <slog_version>
> +
> +    "config.slog.detail"      : <slog_detail>
> +    "config.slog.timers"      : <slog_timers>
> +    "config.slog.aux"         : <slog_aux>
> +    "config.*.*"              : <other_config_settings> (optional)
> +
> +    "timers"                  : <timers> (optional)
> +    "aux"                     : <aux> (optional)
> +
> +    "child_summary"           : <child_summary> (optional)
> +
> +<exit_code> is the value passed to exit() or returned from main().
> +
> +<array_of_error_messages> is an array of messages passed to the die()
> +    and error() functions.
> +
> +<elapsed_time_to_exit> is the elapsed time from start until exit()
> +    was called or main() returned.
> +
> +<elapsed_time_to_atexit> is the elapsed time from start until the slog
> +    atexit routine was called.  This time will include any time
> +    required to shut down or wait for the pager to complete.
> +

I wonder how valuable the difference is between these two and if we 
could simply drop the <elapsed_time_to_atexit>

> +<signal_value> is present if the command was stopped by a single,

s/single/signal

> +    such as a SIGPIPE when the pager is quit.
> +
> +<git_version> is the git version number as reported by "git version".
> +
> +<slog_version> is the SLOG format version.
> +
> +<slog_{detail,timers,aux}> are the values of the corresponding
> +    "slog.{detail,timers,aux}" config setting.  Since these values
> +    control optional SLOG features and filtering, these are present
> +    to help post-processors know if an expected event did not happen
> +    or was simply filtered out.  (Described later)
> +
> +<other_config_settings> is a place for developers to add additional
> +    important config settings to the log.  This is not intended as a
> +    dumping ground for all config settings, but rather only ones that
> +    might affect performance or allow A/B testing in production.
> +
> +<timers> is a structure of any SLOG timers used during the process.
> +    (Described later)
> +
> +<aux> is a structure of any "aux data" generated during the process.
> +    (Described later)
> +
> +<child_summary> is a structure summarizing child processes by class.
> +    (Described later)
> +
> +
> +"event": "detail" and config setting "slog.detail"
> +--------------------------------------------------
> +
> +The "detail" event is used to report progress and/or debug information
> +during a command.  It is a line-by-line (rather than summary) event.
> +Like GIT_TRACE_<key>, detail events are classified by "category" and
> +may be included or omitted based on the "slog.detail" config setting.
> +
> +Here are 3 example "detail" events:
> +
> +{
> +  "event": "detail",
> +  "clock_us": 1530273485479387,
> +  "pid": 107253,
> +  "sid": "1530273485473820-107253",
> +  "command": "status",
> +  "detail": {
> +    "category": "index",
> +    "label": "lazy_init_name_hash",
> +    "data": {
> +      "cache_nr": 3269,
> +      "elapsed_us": 195,
> +      "dir_count": 0,
> +      "dir_tablesize": 4096,
> +      "name_count": 3269,
> +      "name_tablesize": 4096
> +    }
> +  }
> +}
> +{
> +  "event": "detail",
> +  "clock_us": 1530283184051338,
> +  "pid": 109679,
> +  "sid": "1530283180782876-109679",
> +  "command": "fetch",
> +  "detail": {
> +    "category": "child",
> +    "label": "child_starting",
> +    "data": {
> +      "child_id": 3,
> +      "git_cmd": true,
> +      "use_shell": false,
> +      "is_interactive": false,
> +      "child_argv": [
> +        "gc",
> +        "--auto"
> +      ]
> +    }
> +  }
> +}
> +{
> +  "event": "detail",
> +  "clock_us": 1530283184053158,
> +  "pid": 109679,
> +  "sid": "1530283180782876-109679",
> +  "command": "fetch",
> +  "detail": {
> +    "category": "child",
> +    "label": "child_ended",
> +    "data": {
> +      "child_id": 3,
> +      "git_cmd": true,
> +      "use_shell": false,
> +      "is_interactive": false,
> +      "child_argv": [
> +        "gc",
> +        "--auto"
> +      ],
> +      "child_pid": 109684,
> +      "child_exit_code": 0,
> +      "child_elapsed_us": 1819
> +    }
> +  }
> +}
> +
> +A detail event contains the common F1 described earlier.  It also
> +contains 2 fixed fields and 1 variable field:
> +
> +    "detail.category" : <detail_category>
> +    "detail.label"    : <detail_label>
> +    "detail.data"     : <detail_data>
> +
> +<detail_category> is the "category" name for the event.  This is
> +    similar to GIT_TRACE_<key>.  In the example above we have 1
> +    "index" and 2 "child" category events.
> +
> +    If the config setting "slog.detail" is true or contains this
> +    category name, the event will be generated.  If "slog.detail"
> +    is false, no detail events will be generated.
> +
> +    $ git config slog.detail true
> +    $ git config slog.detail child,index,status
> +    $ git config slog.detail false
> +
> +<detail_label> is a descriptive label for the event.  It may be the
> +    name of a function or any meaningful value.
> +
> +<detail_data> is a JSON structure containing context-specific data for
> +    the event.  This replaces the need for printf-like trace messages.

Hmm, couldn't this (a new "Trace Detail Event) be used to help replace 
the existing GIT_TRACE mechanism? I think one missing piece is a 
formatting option other than "pretty." Perhaps another field that is a 
format string that can be used to give it a richer human readable output?

Overall this looks very robust and capable.  Clearly my biggest feedback 
is that I'd like to see this be able to replace the existing GIT_TRACE 
and GIT_TRACE_PERFORMANCE support.  Having two different mechanisms 
seems unnecessarily complex and will increase the mental burden when 
adding tracing.  It also doubles the code we need to 
write/debug/document/support.  I'd hate to see callers of the tracing 
code having to double up and including calls to both tracing mechanisms.

I'm OK if this is a goal and migrating the existing calls to GIT_TRACE 
are cleaned up over time.  It may be a nice intermediate step if the 
existing macros could be re-implemented to sit on top of the new SLOG 
infrastructure. This would certainly make it easier to migrate the code 
base over time.

<snip>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 01/25] structured-logging: design document
  2018-08-03 15:26   ` Ben Peart
@ 2018-08-09 14:30     ` Jeff Hostetler
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Hostetler @ 2018-08-09 14:30 UTC (permalink / raw)
  To: Ben Peart, git; +Cc: gitster, peff, Jeff Hostetler



On 8/3/2018 11:26 AM, Ben Peart wrote:
> 
> 
> On 7/13/2018 12:55 PM, git@jeffhostetler.com wrote:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> ---
>>   Documentation/technical/structured-logging.txt | 816 
>> +++++++++++++++++++++++++
>>   1 file changed, 816 insertions(+)
>>   create mode 100644 Documentation/technical/structured-logging.txt
>>
>> diff --git a/Documentation/technical/structured-logging.txt 
>> b/Documentation/technical/structured-logging.txt
>> new file mode 100644
>> index 0000000..794c614
>> --- /dev/null
>> +++ b/Documentation/technical/structured-logging.txt
>> @@ -0,0 +1,816 @@
>> +Structured Logging
>> +==================
>> +
>> +Structured Logging (SLOG) is an optional feature to allow Git to
>> +generate structured log data for executed commands.  This includes
>> +command line arguments, command run times, error codes and messages,
>> +child process information, time spent in various critical functions,
>> +and repository data-shape information.  Data is written to a target
>> +log file in JSON[1,2,3] format.
>> +
>> +SLOG is disabled by default.  Several steps are required to enable it:
>> +
>> +1. Add the compile-time flag "STRUCTURED_LOGGING=1" when building git
>> +   to include the SLOG routines in the git executable.
>> +
> 
> Is the intent to remove this compile-time flag before this is merged? 
> With it off by default in builds, the audience for this is limited to 
> those who build their own/custom versions of git. I can see other 
> organizations wanting to use this that don't have a custom fork of git 
> they build and install on their users machines.
> 
> Like the GIT_TRACE mechanism today, I think this should be compiled in 
> but turned off via the default settings by default.

I would like to get rid of this compile-time flag and just have it
be available to those who want to use it.  And defaulted to off.
But I wasn't sure what kind of reaction or level of interest this
feature would receive from the mailing list.


>> +2. Set "slog.*" config settings[5] to enable SLOG in your repo.
>> +
>> +
>> +Motivation
>> +==========
>> +
>> +Git users may be faced with scenarios that are surprisingly slow or
>> +produce unexpected results.  And Git developers may have difficulty
>> +reproducing these experiences.  Structured logging allows users to
>> +provide developers with additional usage, performance and error data
>> +that can help diagnose and debug issues.
>> +
>> +Many Git hosting providers and users with many developers have bespoke
>> +efforts to help troubleshoot problems; for example, command wrappers,
>> +custom pre- and post-command hooks, and custom instrumentation of Git
>> +code.  These are inefficient and/or difficult to maintain.  The goal
>> +of SLOG is to provide this data as efficiently as possible.
>> +
>> +And having structured rather than free format log data, will help
>> +developers with their analysis.
>> +
>> +
>> +Background (Git Merge 2018 Barcelona)
>> +=====================================
>> +
>> +Performance and/or error logging was discussed during the contributor's
>> +summit in Barcelona.  Here are the relevant notes from the meeting
>> +minutes[6].
>> +
>> +> Performance misc (Ævar)
>> +> -----------------------
>> +> [...]
>> +>  - central error reporting for git
>> +>    - `git status` logging
>> +>    - git config that collects data, pushes to known endpoint with 
>> `git push`
>> +>    - pre_command and post_command hooks, for logs
>> +>    - `gvfs diagnose` that looks at packfiles, etc
>> +>    - detect BSODs, etc
>> +>    - Dropbox writes out json with index properties and command-line
>> +>        information for status/fetch/push, fork/execs external tool 
>> to upload
>> +>    - windows trace facility; would be nice to have cross-platform
>> +>    - would hosting providers care?
>> +>    - zipfile of logs to give when debugging
>> +>    - sanitizing data is harder
>> +>    - more in a company setting
>> +>    - fileshare to upload zipfile
>> +>    - most of the errors are proxy when they shouldn't, wrong proxy, 
>> proxy
>> +>        specific to particular URL; so upload endpoint wouldn't work
>> +>    - GIT_TRACE is supposed to be that (for proxy)
>> +>    - but we need more trace variables
>> +>    - series to make tracing cheaper
>> +>    - except that curl selects the proxy
>> +>    - trace should have an API, so it can call an executable
>> +>    - dump to .git/traces/... and everything else happens externally
>> +>    - tools like visual studio can't set GIT_TRACE, so
>> +>    - sourcetree has seen user environments where commands just take 
>> forever
>> +>    - third-party tools like perf/strace - could we be better 
>> leveraging those?
>> +>    - distribute turn-key solution to handout to collect more data?
>> +
> 
> While it makes sense to have clear goals in the design document, the 
> motivation and background sections feel somehow out of place.  I'd 
> recommend you clearly articulate the design goals and drop the 
> background data that led you to the goals.

good point.  thanks.


>> +
>> +A Quick Example
>> +===============
>> +
>> +Note: JSON pretty-printing is enabled in all of the examples shown in
>> +this document.  When pretty-printing is turned off, each event is
>> +written on a single line.  Pretty-printing is intended for debugging.
>> +It should be turned off in production to make post-processing easier.
>> + > +    $ git config slog.pretty <bool>
>> +
> 
> nit - I'd move this "Note:" section to the end of your "A Quick 
> Example." While it's good to understand about pretty printing, its not 
> the first or most important thing.
> 
>> +Here is a quick example showing SLOG data for "git status".  This
>> +example has all optional features turned off.  It contains 2 events.
>> +The first is generated when the command started and the second when it
>> +ended.
>> +
>> +{
>> +  "event": "cmd_start",
>> +  "clock_us": 1530273550667800,
>> +  "pid": 107270,
>> +  "sid": "1530273550667800-107270",
>> +  "command": "status",
>> +  "argv": [
>> +    "./git",
>> +    "status"
>> +  ]
>> +}
>> +{
>> +  "event": "cmd_exit",
>> +  "clock_us": 1530273550680460,
>> +  "pid": 107270,
>> +  "sid": "1530273550667800-107270",
>> +  "command": "status",
>> +  "argv": [
>> +    "./git",
>> +    "status"
>> +  ],
>> +  "result": {
>> +    "exit_code": 0,
>> +    "elapsed_core_us": 12573,
>> +    "elapsed_total_us": 12660
>> +  },
>> +  "version": {
>> +    "git": "2.18.0.rc0.83.gde7fb7c",
>> +    "slog": 0
>> +  },
>> +  "config": {
>> +    "slog": {
>> +      "detail": 0,
>> +      "timers": 0,
>> +      "aux": 0
>> +    }
>> +  }
>> +}
>> +
>> +Both events fields describing the event name, the event time (in
> 
> Maybe "Both events have fields describing the event name..."
> 
>> +microseconds since the epoch), the OS process-id, a unique session-id
>> +(described later), the normalized command name, and a vector of the
>> +original command line arguments.
>> +
>> +The "cmd_exit" event additionally contains information about how the
>> +process exited, the elapsed time, the version of git and the SLOG
>> +format, and important SLOG configuration settings.
>> +
> 
> At first glance, the SLOG configuration settings seem out of place but 
> I'll read on to see why they are needed with every cmd_exit event...
> 

Each git command process is independent.  It just emits cmd_start,
detail, cmd_exit events for itself w/o knowing if a previous git
command just wrote a cmd_exit event with those details.  So I included
them in each cmd_exit event.


>> +The fields in the "cmd_start" event are replicated in the "cmd_exit"
>> +event.  This allows log file post-processors to operate in 2 modes:
>> +
>> +1. Just look at the "cmd_exit" events.  This us useful if you just
>> +   want to analyze the command summary data.
>> +
>> +2. Look at the "cmd_start" and "cmd_exit" events as bracketing a time
>> +   span and examine the detailed activity between.  For example, SLOG
>> +   can optionally generate "detail" events when spawning child
>> +   processes and those processes may themselves generate "cmd_start"
>> +   and "cmd_exit" events.  The (top-level) "cmd_start" event serves as
>> +   the starting bracket of all of that activity.
>> +
> 
> I'm assuming the sid is what will enable correlating the child process 
> events with the outer command?

Yes, the SID for a child git process contains the SID of its parent
git process.

> 
>> +
>> +Target Log File
>> +===============
>> +
>> +SLOG writes events to a log file.  File logging works much like
>> +GIT_TRACE where events are appended to a file on disk.
>> +
>> +Logging is enabled if the config variable "slog.path" is set to an
>> +absolute pathname.
>> +
>> +As with GIT_TRACE, this file is local and private to the user's
>> +system.  Log file management and rotation is beyond the scope of the
>> +SLOG effort.
>> +
>> +Similarly, if a user wants to provide this data to a developer, they
>> +must explicitly make these log files available; SLOG does not
>> +broadcast any of this information.  It is up to the users of this
>> +system to decide if any sensitive information should be sanitized, and
>> +how to export the logs.
>> +
> 
> It's good to clarify this.
> 
>> +
>> +Comparison with GIT_TRACE
>> +=========================
>> +
>> +SLOG is very similar to the existing GIT_TRACE[4] API because both
>> +write event messages at various points during a command.  However,
>> +there are some fundamental differences that warrant it being
>> +considered a separate feature rather than just another
>> +GIT_TRACE_<key>:
>> +
> 
> But it does make me wonder if we need to keep both systems.  If there 
> are two, as a developer I need to know which I should use.  What is the 
> criteria I should use to decide between adding GIT_TRACE and SLOG 
> tracing?  It seems like it would be best if we could converge on a 
> single tracing model.
> 
>> +1. GIT_TRACE events are unrelated, line-by-line logging.  SLOG has
>> +   line-by-line events that show command progress and can serve as
>> +   structured debug messages.  SLOG also supports accumulating summary
>> +   data (such as timers) that are automatically added to the final
>> +   `cmd_exit` event.
>> +
>> +2. GIT_TRACE events are unstructured free format printf-style messages
>> +   which makes post-processing difficult.  SLOG events are written in
>> +   JSON and can be easily parsed using Perl, Python, and other tools.
>> +
>> +3. SLOG uses a well-defined API to build SLOG events containing
>> +   well-defined fields to make post-command analysis easier.
>> +
>> +4. It should be easier to filter/redact sensitive information from
>> +   SLOG data than from free form data.
>> +
>> +5. GIT_TRACE events are controlled by one or more global environment
>> +   variables which makes it awkward to selectively log some repos and
>> +   not others.  SLOG events are controlled by a few configuration
>> +   settings[5].  Users (or system administrators) can configure
>> +   logging using repo-local or global config settings.
>> +
>> +6. GIT_TRACE events do not identify the git process.  This makes it
>> +   difficult to associate all of events from a particular command.
>> +   Each SLOG event contains a session id to allow all events for a
>> +   command to be identified.
>> +
>> +7. Some git commands spawn child git commands.  GIT_TRACE has no
>> +   mechanism to associate events from a child process with the parent
>> +   process.  SLOG session ids allow child/parent relationships to be
>> +   tracked (even if there is an intermediate /bin/sh process between
>> +   them).
>> +
>> +8. GIT_TRACE supports logging to a file or stderr.  SLOG only logs to
>> +   a file.
>> +
>> +9. Smashing SLOG into GIT_TRACE doesn't feel like a good fit.  The 2
>> +   APIs share nothing other than the concept that they write logging
>> +   data.
>> +
> 
> Is SLOG a superset of GIT_TRACE? If not, could it be?  While SLOG can't 
> be smashed into GIT_TRACE, can the functionality of GIT_TRACE be 
> subsumed by SLOG so that we can converge on a single tracing system?
> 
> I'd like to see a design goal for this effort to be that we come up with 
> a single tracing mechanism that meets _all_ the requirements (including 
> those currently met by GIT_TRACE).

It would be good to consolidate them for the reasons you suggest.
I initially tried adding SLOG into GIT_TRACE and it just didn't fit.
I hesitated trying to convert GIT_TRACE because of the footprint of
the changes and what I have here is already large.

Let me take another look and see.

> 
>> +
>> +[1] http://json.org/
>> +[2] http://www.ietf.org/rfc/rfc7159.txt
>> +[3] See UTF-8 limitations described in json-writer.h
>> +[4] Documentation/technical/api-trace.txt
>> +[5] See "slog.*" in Documentation/config.txt
>> +[6] https://public-inbox.org/git/20180313004940.GG61720@google.com/t/
>> +
>> +
>> +SLOG Format (V0)
>> +================
>> +
>> +SLOG writes a series of events to the log target.  Each event is a
>> +self-describing JSON object.
>> +
>> +    <event> LF
>> +    <event> LF
>> +    <event> LF
>> +    ...
>> +
>> +Each event record in the log file is an independent and complete JSON
>> +object.  JSON parsers should process the file line-by-line rather than
>> +trying to parse the entire file into a single object.
>> +
>> +    Note: It may be difficult for parsers to find record boundaries if
>> +    pretty-printing is enabled, so it recommended that pretty-printing
>> +    only be enabled for interactive debugging and analysis.
>> +
>> +Every <event> contains the following fields (F1):
>> +
>> +    "event"       : <event_name>
>> +    "clock_us"    : <event_time>
>> +    "pid"         : <os_pid>
>> +    "sid"         : <session_id>
>> +
>> +    "command"     : <command_name>
>> +    "sub_command" : <sub_command_name> (optional)
>> +
>> +<event_name> is one of "cmd_start", "cmd_end", or "detail".
>> +
>> +<event_time> is the time of the event in microseconds since the epoch.
>> +
>> +<os_pid> is the process-id (from getpid()).
>> +
>> +<session_id> is a session-id.  (Described later)
>> +
>> +<command_name> is a (possibly normalized) command name.  This is
>> +    usually taken from the cmd_struct[] table after git parses the
>> +    command line and calls the appropriate cmd_<name>() function.
>> +    Having it in a top-level field saves post-processors from having
>> +    to re-parse the command line to discover it.
>> +
>> +<sub_command_name> further qualifies the command.  This field is
>> +    present for common commands that have multiple command modes.  For
>> +    example, checkout can either change branches and do a full
>> +    checkout or it can checkout (refresh) an individual file.  A
>> +    post-processor wanting to compute percentiles for the time spent
>> +    by branch-changing checkouts could easily filter out the
>> +    individual file checkouts (and without having to re-parse the
>> +    command line).
>> +
>> +    The set of sub_command values are command-specific and are not
>> +    listed here.
> 
> The sub-command definition seems a little squishy but I guess since you
> have the entire set of command line arguments, it's really just a hint
> as you could always drop back to looking at the arguments yourself.

Yes, it is a little squishy and I only added values for a few commands
that I was interested in. Others can be added later.

Without something like this, post-processing is difficult as it would
need reparse the command args and/or expand aliases to figure out what
the type of command -- for example, a branch-changing checkout.  Having
this field lets me dump the log file into SQL and select just those
records, for example.

Granted, not strictly necessary to be inside git.exe, but it saves a
lot of post-processing effort to recreate that information.


> 
>> +
>> +"event": "cmd_start"
>> +-------------------
>> +
>> +The "cmd_start" event is emitted when git starts when cmd_main() is
>> +called.  In addition to the F1 fields, it contains the following
>> +fields (F2):
>> +
>> +    "argv"        : <array-of-command-line-arguments>
>> +
>> +<argv> is an array of the original command line arguments given to the
>> +    command (before git.c has a chance to remove the global options
>> +    before the verb.
>> +
>> +
>> +"event": "cmd_exit"
>> +-------------------
>> +
>> +The "cmd_exit" event is emitted immediately before git exits (during
>> +an atexit() routine).  It contains the F1 and F2 fields as described
>> +above.  It also contains the the following fields (F3):
>> +
>> +    "result.exit_code"        : <exit_code>
>> +    "result.errors"           : <arrary_of_error_messages> (optional)
>> +    "result.elapsed_core_us"  : <elapsed_time_to_exit>
>> +    "result.elapsed_total_us" : <elapsed_time_to_atexit>
>> +    "result.signal"           : <signal_value> (optional)
>> +
>> +    "verion.git"              : <git_version>
>> +    "version.slog"            : <slog_version>
>> +
>> +    "config.slog.detail"      : <slog_detail>
>> +    "config.slog.timers"      : <slog_timers>
>> +    "config.slog.aux"         : <slog_aux>
>> +    "config.*.*"              : <other_config_settings> (optional)
>> +
>> +    "timers"                  : <timers> (optional)
>> +    "aux"                     : <aux> (optional)
>> +
>> +    "child_summary"           : <child_summary> (optional)
>> +
>> +<exit_code> is the value passed to exit() or returned from main().
>> +
>> +<array_of_error_messages> is an array of messages passed to the die()
>> +    and error() functions.
>> +
>> +<elapsed_time_to_exit> is the elapsed time from start until exit()
>> +    was called or main() returned.
>> +
>> +<elapsed_time_to_atexit> is the elapsed time from start until the slog
>> +    atexit routine was called.  This time will include any time
>> +    required to shut down or wait for the pager to complete.
>> +
> 
> I wonder how valuable the difference is between these two and if we 
> could simply drop the <elapsed_time_to_atexit>
> 
>> +<signal_value> is present if the command was stopped by a single,
> 
> s/single/signal
> 
>> +    such as a SIGPIPE when the pager is quit.
>> +
>> +<git_version> is the git version number as reported by "git version".
>> +
>> +<slog_version> is the SLOG format version.
>> +
>> +<slog_{detail,timers,aux}> are the values of the corresponding
>> +    "slog.{detail,timers,aux}" config setting.  Since these values
>> +    control optional SLOG features and filtering, these are present
>> +    to help post-processors know if an expected event did not happen
>> +    or was simply filtered out.  (Described later)
>> +
>> +<other_config_settings> is a place for developers to add additional
>> +    important config settings to the log.  This is not intended as a
>> +    dumping ground for all config settings, but rather only ones that
>> +    might affect performance or allow A/B testing in production.
>> +
>> +<timers> is a structure of any SLOG timers used during the process.
>> +    (Described later)
>> +
>> +<aux> is a structure of any "aux data" generated during the process.
>> +    (Described later)
>> +
>> +<child_summary> is a structure summarizing child processes by class.
>> +    (Described later)
>> +
>> +
>> +"event": "detail" and config setting "slog.detail"
>> +--------------------------------------------------
>> +
>> +The "detail" event is used to report progress and/or debug information
>> +during a command.  It is a line-by-line (rather than summary) event.
>> +Like GIT_TRACE_<key>, detail events are classified by "category" and
>> +may be included or omitted based on the "slog.detail" config setting.
>> +
>> +Here are 3 example "detail" events:
>> +
>> +{
>> +  "event": "detail",
>> +  "clock_us": 1530273485479387,
>> +  "pid": 107253,
>> +  "sid": "1530273485473820-107253",
>> +  "command": "status",
>> +  "detail": {
>> +    "category": "index",
>> +    "label": "lazy_init_name_hash",
>> +    "data": {
>> +      "cache_nr": 3269,
>> +      "elapsed_us": 195,
>> +      "dir_count": 0,
>> +      "dir_tablesize": 4096,
>> +      "name_count": 3269,
>> +      "name_tablesize": 4096
>> +    }
>> +  }
>> +}
>> +{
>> +  "event": "detail",
>> +  "clock_us": 1530283184051338,
>> +  "pid": 109679,
>> +  "sid": "1530283180782876-109679",
>> +  "command": "fetch",
>> +  "detail": {
>> +    "category": "child",
>> +    "label": "child_starting",
>> +    "data": {
>> +      "child_id": 3,
>> +      "git_cmd": true,
>> +      "use_shell": false,
>> +      "is_interactive": false,
>> +      "child_argv": [
>> +        "gc",
>> +        "--auto"
>> +      ]
>> +    }
>> +  }
>> +}
>> +{
>> +  "event": "detail",
>> +  "clock_us": 1530283184053158,
>> +  "pid": 109679,
>> +  "sid": "1530283180782876-109679",
>> +  "command": "fetch",
>> +  "detail": {
>> +    "category": "child",
>> +    "label": "child_ended",
>> +    "data": {
>> +      "child_id": 3,
>> +      "git_cmd": true,
>> +      "use_shell": false,
>> +      "is_interactive": false,
>> +      "child_argv": [
>> +        "gc",
>> +        "--auto"
>> +      ],
>> +      "child_pid": 109684,
>> +      "child_exit_code": 0,
>> +      "child_elapsed_us": 1819
>> +    }
>> +  }
>> +}
>> +
>> +A detail event contains the common F1 described earlier.  It also
>> +contains 2 fixed fields and 1 variable field:
>> +
>> +    "detail.category" : <detail_category>
>> +    "detail.label"    : <detail_label>
>> +    "detail.data"     : <detail_data>
>> +
>> +<detail_category> is the "category" name for the event.  This is
>> +    similar to GIT_TRACE_<key>.  In the example above we have 1
>> +    "index" and 2 "child" category events.
>> +
>> +    If the config setting "slog.detail" is true or contains this
>> +    category name, the event will be generated.  If "slog.detail"
>> +    is false, no detail events will be generated.
>> +
>> +    $ git config slog.detail true
>> +    $ git config slog.detail child,index,status
>> +    $ git config slog.detail false
>> +
>> +<detail_label> is a descriptive label for the event.  It may be the
>> +    name of a function or any meaningful value.
>> +
>> +<detail_data> is a JSON structure containing context-specific data for
>> +    the event.  This replaces the need for printf-like trace messages.
> 
> Hmm, couldn't this (a new "Trace Detail Event) be used to help replace 
> the existing GIT_TRACE mechanism? I think one missing piece is a 
> formatting option other than "pretty." Perhaps another field that is a 
> format string that can be used to give it a richer human readable output?

Maybe.  Let me take a look and see what that would look like.

> 
> Overall this looks very robust and capable.  Clearly my biggest feedback 
> is that I'd like to see this be able to replace the existing GIT_TRACE 
> and GIT_TRACE_PERFORMANCE support.  Having two different mechanisms 
> seems unnecessarily complex and will increase the mental burden when 
> adding tracing.  It also doubles the code we need to 
> write/debug/document/support.  I'd hate to see callers of the tracing 
> code having to double up and including calls to both tracing mechanisms.
> 
> I'm OK if this is a goal and migrating the existing calls to GIT_TRACE 
> are cleaned up over time.  It may be a nice intermediate step if the 
> existing macros could be re-implemented to sit on top of the new SLOG 
> infrastructure. This would certainly make it easier to migrate the code 
> base over time.
> 
> <snip>

Thanks,
Jeff


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 01/25] structured-logging: design document
  2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
  2018-07-14  8:34   ` Simon Ruderich
  2018-08-03 15:26   ` Ben Peart
@ 2018-08-21  4:47   ` Jonathan Nieder
  2 siblings, 0 replies; 38+ messages in thread
From: Jonathan Nieder @ 2018-08-21  4:47 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

Hi,

Jeff Hostetler wrote:

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  Documentation/technical/structured-logging.txt | 816 +++++++++++++++++++++++++
>  1 file changed, 816 insertions(+)
>  create mode 100644 Documentation/technical/structured-logging.txt

Can you add this to Documentation/Makefile as well, so that an HTML
version will appear once this is merged?  See
https://public-inbox.org/git/20180814222846.GG142615@aiede.svl.corp.google.com/
for an example.

[...]
> +++ b/Documentation/technical/structured-logging.txt
> @@ -0,0 +1,816 @@
> +Structured Logging
> +==================
> +
> +Structured Logging (SLOG) is an optional feature to allow Git to
> +generate structured log data for executed commands.  This includes
> +command line arguments, command run times, error codes and messages,
> +child process information, time spent in various critical functions,
> +and repository data-shape information.  Data is written to a target
> +log file in JSON[1,2,3] format.

I like the idea of more structured logs for tracing, monitoring, and
diagnosis (see also
https://research.google.com/archive/papers/dapper-2010-1.pdf on the
subject of tracing).  My main focus in looking over this initial
contribution is

 1. Is this something that could eventually merge with our other
    tracing APIs (e.g. trace_printf), or will they remain separate?

 2. Is the API introduced here one that will be easy to work with
    and adapt?  What will calling code that makes use of it look like?

[...]
> +Background (Git Merge 2018 Barcelona)
> +=====================================
> +
> +Performance and/or error logging was discussed during the contributor's
> +summit in Barcelona.  Here are the relevant notes from the meeting
> +minutes[6].
> +
> +> Performance misc (Ævar)
> +> -----------------------
> +> [...]
> +>  - central error reporting for git
> +>    - `git status` logging
> +>    - git config that collects data, pushes to known endpoint with `git push`
> +>    - pre_command and post_command hooks, for logs
> +>    - `gvfs diagnose` that looks at packfiles, etc
[etc]

I'm not sure what to make of this section.  Is it a historical
artifact, or should I consider it part of the design?

[...]
> +A Quick Example
> +===============
> +
> +Note: JSON pretty-printing is enabled in all of the examples shown in
> +this document.  When pretty-printing is turned off, each event is
> +written on a single line.  Pretty-printing is intended for debugging.
> +It should be turned off in production to make post-processing easier.
> +
> +    $ git config slog.pretty <bool>
> +
> +Here is a quick example showing SLOG data for "git status".  This
> +example has all optional features turned off.  It contains 2 events.

Thanks for the example!  I'd also be interested in what the tracing
code looks like, since the API is the 'deepest' change involved (the
tracing format can easily change later based on experience).

My initial reaction to the example trace is that it feels like an odd
compromise: on one hand it uses text (JSON) to be human-readable, and
on the other hand it is structured to be machine-readable.  The JSON
writer isn't using a standard JSON serialization library so parsing
difficulties seem likely, and the output is noisy enough that reading
it by hand is hard, too.  That leads me to wish for a different
format, like protobuf (where the binary form is very concise and
parseable, and the textual form is IMHO more readable than JSON).

All that said, this is just the first tracing output format and it's
easy to make new ones later.  It seems fine for that purpose.

[...]
> +Target Log File
> +===============
> +
> +SLOG writes events to a log file.  File logging works much like
> +GIT_TRACE where events are appended to a file on disk.

Does it write with O_APPEND?  Does it do anything to guard against
interleaved events --- e.g. are messages buffered and then written
with a single write() call?

[...]
> +Comparison with GIT_TRACE
> +=========================
> +
> +SLOG is very similar to the existing GIT_TRACE[4] API because both
> +write event messages at various points during a command.  However,
> +there are some fundamental differences that warrant it being
> +considered a separate feature rather than just another
> +GIT_TRACE_<key>:

The natural question would be in the opposite direction: can and
should trace_printf be reimplemented on top of this feature?

[...]
> +    
> +    

nit: trailing whitespace (you can find these with "git show --check").

[...]
> +A session id (SID) is a cheap, unique-enough string to associate all
> +of the events generated by a single process.  A child git process inherits
> +the SID of their parent git process and incorporates it into their SID.

I wonder if we can structure events in a more hierarchical way to
avoid having to special-case the toplevel command in this way.

[...]
> +    # The "cmd_exit" event includes the command result and elapsed
> +    # time and the various configuration settings.  During the run
> +    # "index" category timers were placed around the do_read_index()
> +    # and "preload()" calls and various "status" category timers were
> +    # placed around the 3 major parts of the status computation.
> +    # Lastly, an "index" category "aux" data item was added to report
> +    # the size of the index.

Especially for tracing and monitoring applications, it would be
helpful if more of the information were emitted earlier instead of
waiting for exit.  Especially if I am trying to trace the cause of a
hanging command, information that is only printed at exit does not
help me.

The main missing bit in this doc is a sketch of the API.  Looking
forward to finding that in the header. ;-)

Sorry for the slow review, and thanks for your thoughtful work.

Sincerely,
Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile
  2018-07-13 16:55 ` [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile git
@ 2018-08-21  4:49   ` Jonathan Nieder
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Nieder @ 2018-08-21  4:49 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

git@jeffhostetler.com wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach the Makefile to take STRUCTURED_LOGGING=1 variable to
> compile in/out structured logging feature.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  Makefile             |  8 ++++++++
>  structured-logging.c |  9 +++++++++
>  structured-logging.h | 13 +++++++++++++
>  3 files changed, 30 insertions(+)
>  create mode 100644 structured-logging.c
>  create mode 100644 structured-logging.h

This should probably be squashed with a later patch (e.g., patch 3).
When taken alone, it produces

[...]
> --- /dev/null
> +++ b/structured-logging.c
> @@ -0,0 +1,9 @@
> +#if !defined(STRUCTURED_LOGGING)
> +/*
> + * Structured logging is not available.
> + * Stub out all API routines.
> + */
> +
> +#else
> +
> +#endif

which is not idiomatic (for example, it's missing a #include of
git-compat-util.h, etc).

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 03/25] structured-logging: add structured logging framework
  2018-07-13 16:55 ` [PATCH v1 03/25] structured-logging: add structured logging framework git
  2018-07-26  9:09   ` SZEDER Gábor
@ 2018-08-21  5:05   ` Jonathan Nieder
  1 sibling, 0 replies; 38+ messages in thread
From: Jonathan Nieder @ 2018-08-21  5:05 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

Jeff Hostetler wrote:

[...]
> --- a/compat/mingw.h
> +++ b/compat/mingw.h
> @@ -144,8 +144,15 @@ static inline int fcntl(int fd, int cmd, ...)
>  	errno = EINVAL;
>  	return -1;
>  }
> +
>  /* bash cannot reliably detect negative return codes as failure */
> +#if defined(STRUCTURED_LOGGING)

Git usually spells this as #ifdef.

> +#include "structured-logging.h"
> +#define exit(code) exit(strlog_exit_code((code) & 0xff))
> +#else
>  #define exit(code) exit((code) & 0xff)
> +#endif

This is hard to follow, since it only makes sense in combination with
the corresponding code in git-compat-util.h.  Can they be defined
together?  If not, can they have comments to make it easier to know to
edit one too when editing the other?

[...]
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1239,4 +1239,13 @@ extern void unleak_memory(const void *ptr, size_t len);
>  #define UNLEAK(var) do {} while (0)
>  #endif
>  
> +#include "structured-logging.h"

Is this #include needed?  Usually git-compat-util.h only defines C
standard library functions or utilities that are used everywhere.

[...]
> --- a/git.c
> +++ b/git.c
[...]
> @@ -700,7 +701,7 @@ static int run_argv(int *argcp, const char ***argv)
>  	return done_alias;
>  }
>  
> -int cmd_main(int argc, const char **argv)
> +static int real_cmd_main(int argc, const char **argv)
>  {
>  	const char *cmd;
>  	int done_help = 0;
> @@ -779,3 +780,8 @@ int cmd_main(int argc, const char **argv)
>  
>  	return 1;
>  }
> +
> +int cmd_main(int argc, const char **argv)
> +{
> +	return slog_wrap_main(real_cmd_main, argc, argv);
> +}

Can real_cmd_main get a different name, describing what it does?

[...]
> --- a/structured-logging.c
> +++ b/structured-logging.c
> @@ -1,3 +1,10 @@
[...]
> +static uint64_t my__start_time;
> +static uint64_t my__exit_time;
> +static int my__is_config_loaded;
> +static int my__is_enabled;
> +static int my__is_pretty;
> +static int my__signal;
> +static int my__exit_code;
> +static int my__pid;
> +static int my__wrote_start_event;
> +static int my__log_fd = -1;

Please don't use this my__ notation.  The inconsistency with the rest
of Git makes the code feel out of place and provides an impediment to
smooth reading.

[...]
> +static void emit_start_event(void)
> +{
> +	struct json_writer jw = JSON_WRITER_INIT;
> +
> +	/* build "cmd_start" event message */
> +	jw_object_begin(&jw, my__is_pretty);
> +	{
> +		jw_object_string(&jw, "event", "cmd_start");
> +		jw_object_intmax(&jw, "clock_us", (intmax_t)my__start_time);
> +		jw_object_intmax(&jw, "pid", (intmax_t)my__pid);

The use of blocks here is unexpected and makes me wonder what kind of
macro wizardry is going on.  Perhaps this is idiomatic for the
json-writer API; if so, can you add an example to json-writer.h to
help the next surprised reader?

That said, I think

	json_object_begin(&jw, ...);

	json_object_string(...
	json_object_int(...
	...

	json_object_begin_inline_array(&jw, "argv");
	for (k = 0; k < argv.argc; k++)
		json_object_string(...
	json_object_end(&jw);

	json_object_end(&jw);

is still readable and less unexpected.

[...]
> +static void emit_exit_event(void)
> +{
> +	struct json_writer jw = JSON_WRITER_INIT;
> +	uint64_t atexit_time = getnanotime() / 1000;
> +
> +	/* close unterminated forms */

What are unterminated forms?

> +	if (my__errors.json.len)
> +		jw_end(&my__errors);
[...]
> +int slog_default_config(const char *key, const char *value)
> +{
> +	const char *sub;
> +
> +	/*
> +	 * git_default_config() calls slog_default_config() with "slog.*"
> +	 * k/v pairs.  git_default_config() MAY or MAY NOT be called when
> +	 * cmd_<command>() calls git_config().

No need to shout.

[...]
> +/*
> + * If cmd_<command>() did not cause slog_default_config() to be called
> + * during git_config(), we try to lookup our config settings the first
> + * time we actually need them.
> + *
> + * (We do this rather than using read_early_config() at initialization
> + * because we want any "-c key=value" arguments to be included.)
> + */

Which function is initialization referring to here?

Lazy loading in order to guarantee loading after a different subsystem
sounds a bit fragile, so I wonder if we can make the sequencing more
explicit.

Stopping here.  I still like where this is going, but some aspects of
the coding style are making it hard to see the forest for the trees.
Perhaps some more details about API in the design doc would help.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 00/25] RFC: structured logging
  2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
                   ` (25 preceding siblings ...)
  2018-07-13 18:51 ` [PATCH v1 00/25] RFC: structured logging David Lang
@ 2018-08-28 17:38 ` Junio C Hamano
  2018-08-28 18:47   ` Jeff Hostetler
  26 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2018-08-28 17:38 UTC (permalink / raw)
  To: git; +Cc: git, peff, Jeff Hostetler

git@jeffhostetler.com writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> This RFC patch series adds structured logging to git.  The motivation,
> ...
>
> Jeff Hostetler (25):
>   structured-logging: design document
>   structured-logging: add STRUCTURED_LOGGING=1 to Makefile
>   structured-logging: add structured logging framework
>   structured-logging: add session-id to log events
>   structured-logging: set sub_command field for branch command
>   structured-logging: set sub_command field for checkout command
>   structured-logging: t0420 basic tests
>   structured-logging: add detail-event facility
>   structured-logging: add detail-event for lazy_init_name_hash
>   structured-logging: add timer facility
>   structured-logging: add timer around do_read_index
>   structured-logging: add timer around do_write_index
>   structured-logging: add timer around wt-status functions
>   structured-logging: add timer around preload_index
>   structured-logging: t0420 tests for timers
>   structured-logging: add aux-data facility
>   structured-logging: add aux-data for index size
>   structured-logging: add aux-data for size of sparse-checkout file
>   structured-logging: t0420 tests for aux-data
>   structured-logging: add structured logging to remote-curl
>   structured-logging: add detail-events for child processes
>   structured-logging: add child process classification
>   structured-logging: t0420 tests for child process detail events
>   structured-logging: t0420 tests for interacitve child_summary
>   structured-logging: add config data facility


I noticed that Travis job has been failing with a trivially fixable
failure, so I'll push out today's 'pu' with the attached applied on
top.  This may become unapplicable to the code when issues raised in
recent reviews addressed, though.

 structured-logging.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/structured-logging.c b/structured-logging.c
index 0e3f79ee48..78abcd2e59 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -593,8 +593,7 @@ void slog_set_command_name(const char *command_name)
 	 * the cmd_<command>() and/or it may be too early to force a
 	 * lazy load.
 	 */
-	if (my__command_name)
-		free(my__command_name);
+	free(my__command_name);
 	my__command_name = xstrdup(command_name);
 }
 
@@ -606,8 +605,7 @@ void slog_set_sub_command_name(const char *sub_command_name)
 	 * the cmd_<command>() and/or it may be too early to force a
 	 * lazy load.
 	 */
-	if (my__sub_command_name)
-		free(my__sub_command_name);
+	free(my__sub_command_name);
 	my__sub_command_name = xstrdup(sub_command_name);
 }
 
-- 
2.19.0-rc0-48-gb9dfa238d5



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v1 00/25] RFC: structured logging
  2018-08-28 17:38 ` Junio C Hamano
@ 2018-08-28 18:47   ` Jeff Hostetler
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Hostetler @ 2018-08-28 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, Jeff Hostetler



On 8/28/2018 1:38 PM, Junio C Hamano wrote:
> git@jeffhostetler.com writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> This RFC patch series adds structured logging to git.  The motivation,
>> ...
>>
>> Jeff Hostetler (25):
>>    structured-logging: design document
>>    structured-logging: add STRUCTURED_LOGGING=1 to Makefile
>>    structured-logging: add structured logging framework
>>    structured-logging: add session-id to log events
>>    structured-logging: set sub_command field for branch command
>>    structured-logging: set sub_command field for checkout command
>>    structured-logging: t0420 basic tests
>>    structured-logging: add detail-event facility
>>    structured-logging: add detail-event for lazy_init_name_hash
>>    structured-logging: add timer facility
>>    structured-logging: add timer around do_read_index
>>    structured-logging: add timer around do_write_index
>>    structured-logging: add timer around wt-status functions
>>    structured-logging: add timer around preload_index
>>    structured-logging: t0420 tests for timers
>>    structured-logging: add aux-data facility
>>    structured-logging: add aux-data for index size
>>    structured-logging: add aux-data for size of sparse-checkout file
>>    structured-logging: t0420 tests for aux-data
>>    structured-logging: add structured logging to remote-curl
>>    structured-logging: add detail-events for child processes
>>    structured-logging: add child process classification
>>    structured-logging: t0420 tests for child process detail events
>>    structured-logging: t0420 tests for interacitve child_summary
>>    structured-logging: add config data facility
> 
> 
> I noticed that Travis job has been failing with a trivially fixable
> failure, so I'll push out today's 'pu' with the attached applied on
> top.  This may become unapplicable to the code when issues raised in
> recent reviews addressed, though.
> 
>   structured-logging.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/structured-logging.c b/structured-logging.c
> index 0e3f79ee48..78abcd2e59 100644
> --- a/structured-logging.c
> +++ b/structured-logging.c
> @@ -593,8 +593,7 @@ void slog_set_command_name(const char *command_name)
>   	 * the cmd_<command>() and/or it may be too early to force a
>   	 * lazy load.
>   	 */
> -	if (my__command_name)
> -		free(my__command_name);
> +	free(my__command_name);
>   	my__command_name = xstrdup(command_name);
>   }
>   
> @@ -606,8 +605,7 @@ void slog_set_sub_command_name(const char *sub_command_name)
>   	 * the cmd_<command>() and/or it may be too early to force a
>   	 * lazy load.
>   	 */
> -	if (my__sub_command_name)
> -		free(my__sub_command_name);
> +	free(my__sub_command_name);
>   	my__sub_command_name = xstrdup(sub_command_name);
>   }
>   
> 

Sorry about that.

Let me withdraw the current series.  I'm working on a new version that
addresses the comments on the mailing list.  It combines my logging
with a variation on the nested perf logging that Duy suggested and
the consolidation that you were talking about last week.

Jeff


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-08-28 18:47 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13 16:55 [PATCH v1 00/25] RFC: structured logging git
2018-07-13 16:55 ` [PATCH v1 01/25] structured-logging: design document git
2018-07-14  8:34   ` Simon Ruderich
2018-08-03 15:26   ` Ben Peart
2018-08-09 14:30     ` Jeff Hostetler
2018-08-21  4:47   ` Jonathan Nieder
2018-07-13 16:55 ` [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile git
2018-08-21  4:49   ` Jonathan Nieder
2018-07-13 16:55 ` [PATCH v1 03/25] structured-logging: add structured logging framework git
2018-07-26  9:09   ` SZEDER Gábor
2018-07-27 12:45     ` Jeff Hostetler
2018-08-21  5:05   ` Jonathan Nieder
2018-07-13 16:56 ` [PATCH v1 04/25] structured-logging: add session-id to log events git
2018-07-13 16:56 ` [PATCH v1 05/25] structured-logging: set sub_command field for branch command git
2018-07-13 16:56 ` [PATCH v1 06/25] structured-logging: set sub_command field for checkout command git
2018-07-13 16:56 ` [PATCH v1 07/25] structured-logging: t0420 basic tests git
2018-07-13 16:56 ` [PATCH v1 08/25] structured-logging: add detail-event facility git
2018-07-13 16:56 ` [PATCH v1 09/25] structured-logging: add detail-event for lazy_init_name_hash git
2018-07-13 16:56 ` [PATCH v1 10/25] structured-logging: add timer facility git
2018-07-13 16:56 ` [PATCH v1 11/25] structured-logging: add timer around do_read_index git
2018-07-13 16:56 ` [PATCH v1 12/25] structured-logging: add timer around do_write_index git
2018-07-13 16:56 ` [PATCH v1 13/25] structured-logging: add timer around wt-status functions git
2018-07-13 16:56 ` [PATCH v1 14/25] structured-logging: add timer around preload_index git
2018-07-13 16:56 ` [PATCH v1 15/25] structured-logging: t0420 tests for timers git
2018-07-13 16:56 ` [PATCH v1 16/25] structured-logging: add aux-data facility git
2018-07-13 16:56 ` [PATCH v1 17/25] structured-logging: add aux-data for index size git
2018-07-13 16:56 ` [PATCH v1 18/25] structured-logging: add aux-data for size of sparse-checkout file git
2018-07-13 16:56 ` [PATCH v1 19/25] structured-logging: t0420 tests for aux-data git
2018-07-13 16:56 ` [PATCH v1 20/25] structured-logging: add structured logging to remote-curl git
2018-07-13 16:56 ` [PATCH v1 21/25] structured-logging: add detail-events for child processes git
2018-07-13 16:56 ` [PATCH v1 22/25] structured-logging: add child process classification git
2018-07-13 16:56 ` [PATCH v1 23/25] structured-logging: t0420 tests for child process detail events git
2018-07-13 16:56 ` [PATCH v1 24/25] structured-logging: t0420 tests for interacitve child_summary git
2018-07-13 16:56 ` [PATCH v1 25/25] structured-logging: add config data facility git
2018-07-13 18:51 ` [PATCH v1 00/25] RFC: structured logging David Lang
2018-07-16 13:29   ` Jeff Hostetler
2018-08-28 17:38 ` Junio C Hamano
2018-08-28 18:47   ` Jeff Hostetler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).