git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josh Steadmon <steadmon@google.com>
To: git@vger.kernel.org, gitster@pobox.com, git@jeffhostetler.com,
	avarab@gmail.com, peff@peff.net, jnareb@gmail.com
Subject: [PATCH v3 0/3] Add a JSON Schema for trace2 events
Date: Wed, 24 Jul 2019 16:06:50 -0700	[thread overview]
Message-ID: <cover.1564009259.git.steadmon@google.com> (raw)
In-Reply-To: <cover.1560295286.git.steadmon@google.com>

This is a proof of concept series that formalizes the structure of trace2 event
output using JSON-Schema [1].

It provides a validator (written in Go) that verifies the events in a given
trace2 event output file match the schema. I am happy to rewrite this validator
in some other language, provided that the language has a JSON-Schema library
supporting at least draft-04.

It runs the validator as part of the CI suite (it increase the runtime
by about 15 minutes). It tests that the trace output of "make test"
conforms to the schema. Users of the trace2 event output can be
relatively confident that the output format has not changed so long as
the schema file remains the same and the regression test is passing.

I would appreciate any feedback on better ways to integrate the
validator into the CI suite.

I have not added support for standalone schema validators (as requested
in the discussion of V1 of this series) because the few that I tested on
my workstation ran for multiple hours (vs. 15 minutes for the validator
included in this series). If someone can suggest a performant standalone
validator, I will be happy to test that.

[1]: https://json-schema.org/

Changes since V2 of this series:
* corrected commit message regarding the different schema variations
* cleaned up the Makefile
* added comment noting that the validator expects JSON-Lines input
* added a --progress flag to the validator
* improved validation error output

Changes since V1 of this series:
* dropped the documenation fix, as it can be submitted separately from
  this series
* added JSON-array versions of the schema (currently unused)
* added the validation test to the CI suite

Josh Steadmon (3):
  trace2: Add a JSON schema for trace2 events
  trace2: add a schema validator for trace2 events
  ci: run trace2 schema validation in the CI suite

 ci/run-build-and-tests.sh                     |   6 +
 t/trace_schema_validator/.gitignore           |   1 +
 t/trace_schema_validator/Makefile             |  18 +
 t/trace_schema_validator/README               |  23 +
 t/trace_schema_validator/event_schema.json    | 398 ++++++++++++++
 t/trace_schema_validator/list_schema.json     | 401 ++++++++++++++
 .../strict_list_schema.json                   | 514 ++++++++++++++++++
 t/trace_schema_validator/strict_schema.json   | 511 +++++++++++++++++
 .../trace_schema_validator.go                 |  82 +++
 9 files changed, 1954 insertions(+)
 create mode 100644 t/trace_schema_validator/.gitignore
 create mode 100644 t/trace_schema_validator/Makefile
 create mode 100644 t/trace_schema_validator/README
 create mode 100644 t/trace_schema_validator/event_schema.json
 create mode 100644 t/trace_schema_validator/list_schema.json
 create mode 100644 t/trace_schema_validator/strict_list_schema.json
 create mode 100644 t/trace_schema_validator/strict_schema.json
 create mode 100644 t/trace_schema_validator/trace_schema_validator.go

Range-diff against v2:
1:  a949db776c ! 1:  d4e82796bc trace2: Add a JSON schema for trace2 events
    @@ Commit message
         objects. This can be used to add regression tests to verify that the
         event output format does not change unexpectedly.
     
    -    Two versions of the schema are provided:
    +    Four versions of the schema are provided:
         * event_schema.json is more permissive. It verifies that all expected
    -      fields are present in each trace event, but it allows traces to have
    +      fields are present in a trace event, but it allows traces to have
           unexpected additional fields. This allows the schema to be specified
           more concisely by factoring out the common fields into a reusable
           sub-schema.
         * strict_schema.json is more restrictive. It verifies that all expected
    -      fields are present and no unexpected fields are present in each trace
    +      fields are present and no unexpected fields are present in the trace
           event. Due to this additional restriction, the common fields cannot be
           factored out into a re-usable subschema (at least as-of draft-07) [2],
           and must be repeated for each event definition.
    +    * list_schema.json is like event_schema.json above, but validates a JSON
    +      array of trace events, rather than a single event.
    +    * strict_list_schema.json is like strict_schema.json above, but
    +      validates a JSON array of trace events, rather than a single event.
     
         [1]: https://json-schema.org/
         [2]: https://json-schema.org/understanding-json-schema/reference/combining.html#allof
2:  3fa4e9eef8 ! 2:  97cb6a3eb4 trace2: add a schema validator for trace2 events
    @@ t/trace_schema_validator/.gitignore (new)
     
      ## t/trace_schema_validator/Makefile (new) ##
     @@
    ++RM = rm -f
    ++PROGRAMS = trace_schema_validator
    ++GOCMD = go
    ++GOBUILD = $(GOCMD) build
    ++GOGET = $(GOCMD) get
    ++
     +.PHONY: fetch_deps clean
     +
    ++all: $(PROGRAMS)
    ++
     +trace_schema_validator: fetch_deps trace_schema_validator.go
    -+	go build
    ++	$(GOBUILD) -o trace_schema_validator
     +
     +fetch_deps:
    -+	go get github.com/xeipuuv/gojsonschema
    ++	$(GOGET) github.com/xeipuuv/gojsonschema
     +
     +clean:
    -+	rm -f trace_schema_validator
    ++	$(RM) $(PROGRAMS)
     
      ## t/trace_schema_validator/trace_schema_validator.go (new) ##
     @@
     +// trace_schema_validator validates individual lines of an input file against a
     +// provided JSON-Schema for git trace2 event output.
     +//
    ++// Note that this expects each object to validate to be on its own line in the
    ++// input file (AKA JSON-Lines format). This is what Git natively writes with
    ++// GIT_TRACE2_EVENT enabled.
    ++//
     +// Traces can be collected by setting the GIT_TRACE2_EVENT environment variable
     +// to an absolute path and running any Git command; traces will be appended to
     +// the file.
     +//
     +// Traces can then be verified like so:
     +//   trace_schema_validator \
    -+//     --trace2_event_file /path/to/trace/output \
    -+//     --schema_file /path/to/schema
    ++//     --trace2-event-file /path/to/trace/output \
    ++//     --schema-file /path/to/schema
     +package main
     +
     +import (
    @@ t/trace_schema_validator/trace_schema_validator.go (new)
     +)
     +
     +// Required flags
    -+var schemaFile = flag.String("schema_file", "", "JSON-Schema filename")
    -+var trace2EventFile = flag.String("trace2_event_file", "", "trace2 event filename")
    ++var schemaFile = flag.String("schema-file", "", "JSON-Schema filename")
    ++var trace2EventFile = flag.String("trace2-event-file", "", "trace2 event filename")
    ++var progress = flag.Int("progress", 0, "Print progress message each time we have validated this many lines. --progress=0 means no messages are printed")
     +
     +func main() {
     +	flag.Parse()
     +	if *schemaFile == "" || *trace2EventFile == "" {
    -+		log.Fatal("Both --schema_file and --trace2_event_file are required.")
    ++		log.Fatal("Both --schema-file and --trace2-event-file are required.")
     +	}
     +	schemaURI, err := filepath.Abs(*schemaFile)
     +	if err != nil {
    @@ t/trace_schema_validator/trace_schema_validator.go (new)
     +
     +	count := 0
     +	for ; scanner.Scan(); count++ {
    -+		if count%10000 == 0 {
    -+			// Travis-CI expects regular output or it will time out.
    ++		if *progress != 0 && count%*progress == 0 {
     +			log.Print("Validated items: ", count)
     +		}
     +		event := gojsonschema.NewStringLoader(scanner.Text())
    @@ t/trace_schema_validator/trace_schema_validator.go (new)
     +			log.Fatal(err)
     +		}
     +		if !result.Valid() {
    -+			log.Print("Trace event is invalid: ", scanner.Text())
    ++			log.Printf("Trace event line %d is invalid: %s", count+1, scanner.Text())
     +			for _, desc := range result.Errors() {
     +				log.Print("- ", desc)
     +			}
3:  acf3aebcaa ! 3:  a07458b2e4 ci: run trace2 schema validation in the CI suite
    @@ ci/run-build-and-tests.sh: then
      	make test
     +	t/trace_schema_validator/trace_schema_validator \
     +		--trace2_event_file=${GIT_TRACE2_EVENT} \
    -+		--schema_file=t/trace_schema_validator/strict_schema.json
    ++		--schema_file=t/trace_schema_validator/strict_schema.json \
    ++		--progress=10000
      fi
      
      check_unignored_build_artifacts
-- 
2.22.0.709.g102302147b-goog


  parent reply	other threads:[~2019-07-24 23:06 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-11 23:31 [RFC PATCH 0/3] Add a JSON Schema for trace2 events Josh Steadmon
2019-06-11 23:31 ` [RFC PATCH 1/3] trace2: correct trace2 field name documentation Josh Steadmon
2019-06-12 18:00   ` Junio C Hamano
2019-06-12 18:14     ` Josh Steadmon
2019-06-14 15:53   ` Jeff Hostetler
2019-06-11 23:31 ` [RFC PATCH 2/3] trace2: Add a JSON schema for trace2 events Josh Steadmon
2019-06-14 15:59   ` Jeff Hostetler
2019-06-20 17:26     ` Josh Steadmon
2019-06-11 23:31 ` [RFC PATCH 3/3] trace2: add a schema validator " Josh Steadmon
2019-06-12 13:28   ` Ævar Arnfjörð Bjarmason
2019-06-12 16:23     ` Josh Steadmon
2019-06-12 19:18       ` Jeff King
2019-06-20 18:15         ` Josh Steadmon
2019-06-21 11:53       ` Jakub Narebski
2019-06-27 13:57         ` Jeff Hostetler
2019-07-09 23:05 ` [RFC PATCH v2 0/3] Add a JSON Schema " Josh Steadmon
2019-07-09 23:05   ` [RFC PATCH v2 1/3] trace2: Add a JSON schema " Josh Steadmon
2019-07-10 18:32     ` Jakub Narebski
2019-07-24 22:37       ` Josh Steadmon
2019-07-09 23:05   ` [RFC PATCH v2 2/3] trace2: add a schema validator " Josh Steadmon
2019-07-11 13:35     ` Jakub Narebski
2019-07-24 22:47       ` Josh Steadmon
2019-07-09 23:05   ` [RFC PATCH v2 3/3] ci: run trace2 schema validation in the CI suite Josh Steadmon
2019-07-24 23:06 ` Josh Steadmon [this message]
2019-07-24 23:06   ` [PATCH v3 1/3] trace2: Add a JSON schema for trace2 events Josh Steadmon
2019-07-25 16:55     ` Junio C Hamano
2019-07-24 23:06   ` [PATCH v3 2/3] trace2: add a schema validator " Josh Steadmon
2019-07-24 23:06   ` [PATCH v3 3/3] ci: run trace2 schema validation in the CI suite Josh Steadmon
2019-07-25 11:18   ` [PATCH v3 0/3] Add a JSON Schema for trace2 events SZEDER Gábor
2019-07-25 16:14     ` Junio C Hamano
2019-07-26 21:16       ` Josh Steadmon
2019-07-25 23:42   ` SZEDER Gábor
2019-07-26 12:12     ` Johannes Schindelin
2019-07-26 13:53       ` SZEDER Gábor
2019-07-31 11:00         ` Johannes Schindelin
2019-07-26 22:03       ` Josh Steadmon
2019-08-01 18:08         ` Josh Steadmon
2019-08-02  1:52           ` Jonathan Nieder
2019-08-02 11:56             ` Johannes Schindelin
2019-08-02 16:59               ` Jonathan Nieder
2019-08-02 19:38                 ` SZEDER Gábor
2019-08-02 23:25                   ` Jonathan Nieder
2019-08-03 21:25                     ` Johannes Schindelin
2019-08-02 19:16             ` SZEDER Gábor
2019-08-02 23:06               ` Jonathan Nieder
2019-08-03  7:35                 ` SZEDER Gábor
2019-08-03  7:40                   ` SZEDER Gábor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1564009259.git.steadmon@google.com \
    --to=steadmon@google.com \
    --cc=avarab@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).