git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, gitster@pobox.com,
	lars.schneider@autodesk.com,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 0/2] routines to generate JSON data
Date: Mon, 19 Mar 2018 06:19:26 -0400	[thread overview]
Message-ID: <e2a9d2a9-f2e2-f285-52c3-be11668da543@jeffhostetler.com> (raw)
In-Reply-To: <20180316211837.GB12333@sigill.intra.peff.net>



On 3/16/2018 5:18 PM, Jeff King wrote:
> On Fri, Mar 16, 2018 at 07:40:55PM +0000, git@jeffhostetler.com wrote:
> 
[...]
> I really like the idea of being able to send our machine-readable output
> in some "standard" syntax for which people may already have parsers. But
> one big hangup with JSON is that it assumes all strings are UTF-8. That
> may be OK for telemetry data, but it would probably lead to problems for
> something like status porcelain, since Git's view of paths is just a
> string of bytes (not to mention possible uses elsewhere like author
> names, subject lines, etc).
[...]

I'll come back to the UTF-8/YAML questions in a separate response.


>> Documentation for the new API is given in json-writer.h at the bottom of
>> the first patch.
> 
> The API generally looks pleasant, but the nesting surprised me.  E.g.,
> I'd have expected:
> 
>    jw_array_begin(out);
>    jw_array_begin(out);
>    jw_array_append_int(out, 42);
>    jw_array_end(out);
>    jw_array_end(out);
> 
> to result in an array containing an array containing an integer. But
> array_begin() actually resets the strbuf, so you can't build up nested
> items like this internally. Ditto for objects within objects. You have
> to use two separate strbufs and copy the data an extra time.
> 
> To make the above work, I think you'd have to store a little more state.
> E.g., the "array_append" functions check "out->len" to see if they need
> to add a separating comma. That wouldn't work if we might be part of a
> nested array. So I think you'd need a context struct like:
> 
>    struct json_writer {
>      int first_item;
>      struct strbuf out;
>    };
>    #define JSON_WRITER_INIT { 1, STRBUF_INIT }
> 
> to store the state and the output. As a bonus, you could also use it to
> store some other sanity checks (e.g., keep a "depth" counter and BUG()
> when somebody tries to access the finished strbuf with a hanging-open
> object or array).

Yeah, I thought about that, but I think it gets more complex than that.
I'd need a stack of "first_item" values.  Or maybe the _begin() needs to
increment a depth and set first_item and the _end() needs to always
unset first_item.  I'll look at this gain.

The thing I liked about the bottom-up construction is that it is easier
to collect multiple sets in parallel and combine them during the final
roll-up.  With the in-line nesting, you're tempted to try to construct
the resulting JSON in a single series and that may not fit what the code
is trying to do.  For example, if I wanted to collect an array of error
messages as they are generated and an array of argv arrays and any alias
expansions, then put together a final JSON string containing them and
the final exit code, I'd need to build it in parts.  I can build these
parts in pieces of JSON and combine them at the end -- or build up other
similar data structures (string arrays, lists, or whatever) and then
have a JSON conversion step.  But we can make it work both ways, I just
wanted to keep it simpler.


>> I wasn't sure how to unit test the API from a shell script, so I added a
>> helper command that does most of the work in the second patch.
> 
> In general I'd really prefer to keep the shell script as the driver for
> the tests, and have t/helper programs just be conduits. E.g., something
> like:
> 
>    cat >expect <<-\EOF &&
>    {"key": "value", "foo": 42}
>    EOF
>    test-json-writer >actual \
>      object_begin \
>      object_append_string key value \
>      object_append_int foo 42 \
>      object_end &&
>    test_cmp expect actual
> 
> It's a bit tedious (though fairly mechanical) to expose the API in this
> way, but it makes it much easier to debug, modify, or add tests later
> on (for example, I had to modify the C program to find out that my
> append example above wouldn't work).

Yeah, I wasn't sure if such a simple api required exposing all that
machinery to the shell or not.  And the api is fairly self-contained
and not depending on a lot of disk/repo setup or anything, so my tests
would be essentially static WRT everything else.

With my t0019 script you should have been able use -x -v to see what
was failing.

> 
> -Peff
> 

thanks for the quick review
Jeff

  parent reply	other threads:[~2018-03-19 10:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-16 19:40 [PATCH 0/2] routines to generate JSON data git
2018-03-16 19:40 ` [PATCH 1/2] json_writer: new routines to create data in JSON format git
2018-03-16 19:40 ` [PATCH 2/2] json-writer: unit test git
2018-03-16 21:18 ` [PATCH 0/2] routines to generate JSON data Jeff King
2018-03-16 23:00   ` Ævar Arnfjörð Bjarmason
2018-03-20  5:52     ` Jeff King
2018-03-17  7:38   ` Jacob Keller
2018-03-19 17:31     ` Jeff Hostetler
2018-03-19 10:19   ` Jeff Hostetler [this message]
2018-03-20  5:42     ` Jeff King
2018-03-20 16:44       ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2a9d2a9-f2e2-f285-52c3-be11668da543@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=lars.schneider@autodesk.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).