git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@jeffhostetler.com
Cc: git@vger.kernel.org, gitster@pobox.com,
	lars.schneider@autodesk.com,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 0/2] routines to generate JSON data
Date: Fri, 16 Mar 2018 17:18:37 -0400	[thread overview]
Message-ID: <20180316211837.GB12333@sigill.intra.peff.net> (raw)
In-Reply-To: <20180316194057.77513-1-git@jeffhostetler.com>

On Fri, Mar 16, 2018 at 07:40:55PM +0000, git@jeffhostetler.com wrote:

> This patch series adds a set of utility routines to compose data in JSON
> format into a "struct strbuf".  The resulting string can then be output
> by commands wanting to support a JSON output format.
> 
> This is a stand alone patch.  Nothing currently uses these routines.  I'm
> currently working on a series to log "telemetry" data (as we discussed
> briefly during Ævar's "Performance Misc" session [1] in Barcelona last
> week).  And I want emit the data in JSON rather than a fixed column/field
> format.  The JSON routines here are independent of that, so it made sense
> to submit the JSON part by itself.
> 
> Back when we added porcelain=v2 format to status, we talked about adding a
> JSON format.  I think the routines in this patch would let us easily do
> that, if someone were interested.  (Extending status is not on my radar
> right now, however.)

I really like the idea of being able to send our machine-readable output
in some "standard" syntax for which people may already have parsers. But
one big hangup with JSON is that it assumes all strings are UTF-8. That
may be OK for telemetry data, but it would probably lead to problems for
something like status porcelain, since Git's view of paths is just a
string of bytes (not to mention possible uses elsewhere like author
names, subject lines, etc).

Before we commit to a standardized format, I think we need to work out
a solution there (because I'd much rather not go down this road for
telemetry data only to find that we cannot use the same standardized
format in other parts of Git).

Some possible solutions I can think of:

  1. Ignore the UTF-8 requirement, making a JSON-like output (which I
     think is what your patches do). I'm not sure what problems this
     might cause on the parsing side.

  2. Specially encode non-UTF-8 bits. I'm not familiar enough with JSON
     to know the options here, but my understanding is that numeric
     escapes are just for inserting unicode code points. _Can_ you
     actually transport arbitrary binary data across JSON without
     base64-encoding it (yech)?

  3. Some other similar format. YAML comes to mind. Last time I looked
     (quite a while ago), it seemed insanely complex, but I think you
     could implement only a reasonable subset. OTOH, I think the tools
     ecosystem for parsing JSON (e.g., jq) is much better.

> Documentation for the new API is given in json-writer.h at the bottom of
> the first patch.

The API generally looks pleasant, but the nesting surprised me.  E.g.,
I'd have expected:

  jw_array_begin(out);
  jw_array_begin(out);
  jw_array_append_int(out, 42);
  jw_array_end(out);
  jw_array_end(out);

to result in an array containing an array containing an integer. But
array_begin() actually resets the strbuf, so you can't build up nested
items like this internally. Ditto for objects within objects. You have
to use two separate strbufs and copy the data an extra time.

To make the above work, I think you'd have to store a little more state.
E.g., the "array_append" functions check "out->len" to see if they need
to add a separating comma. That wouldn't work if we might be part of a
nested array. So I think you'd need a context struct like:

  struct json_writer {
    int first_item;
    struct strbuf out;
  };
  #define JSON_WRITER_INIT { 1, STRBUF_INIT }

to store the state and the output. As a bonus, you could also use it to
store some other sanity checks (e.g., keep a "depth" counter and BUG()
when somebody tries to access the finished strbuf with a hanging-open
object or array).

> I wasn't sure how to unit test the API from a shell script, so I added a
> helper command that does most of the work in the second patch.

In general I'd really prefer to keep the shell script as the driver for
the tests, and have t/helper programs just be conduits. E.g., something
like:

  cat >expect <<-\EOF &&
  {"key": "value", "foo": 42}
  EOF
  test-json-writer >actual \
    object_begin \
    object_append_string key value \
    object_append_int foo 42 \
    object_end &&
  test_cmp expect actual

It's a bit tedious (though fairly mechanical) to expose the API in this
way, but it makes it much easier to debug, modify, or add tests later
on (for example, I had to modify the C program to find out that my
append example above wouldn't work).

-Peff

  parent reply	other threads:[~2018-03-16 21:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-16 19:40 git
2018-03-16 19:40 ` [PATCH 1/2] json_writer: new routines to create data in JSON format git
2018-03-16 19:40 ` [PATCH 2/2] json-writer: unit test git
2018-03-16 21:18 ` Jeff King [this message]
2018-03-16 23:00   ` [PATCH 0/2] routines to generate JSON data Ævar Arnfjörð Bjarmason
2018-03-20  5:52     ` Jeff King
2018-03-17  7:38   ` Jacob Keller
2018-03-19 17:31     ` Jeff Hostetler
2018-03-19 10:19   ` Jeff Hostetler
2018-03-20  5:42     ` Jeff King
2018-03-20 16:44       ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180316211837.GB12333@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=lars.schneider@autodesk.com \
    --subject='Re: [PATCH 0/2] routines to generate JSON data' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).