From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, gitster@pobox.com,
lars.schneider@autodesk.com,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 0/2] routines to generate JSON data
Date: Mon, 19 Mar 2018 06:19:26 -0400 [thread overview]
Message-ID: <e2a9d2a9-f2e2-f285-52c3-be11668da543@jeffhostetler.com> (raw)
In-Reply-To: <20180316211837.GB12333@sigill.intra.peff.net>
On 3/16/2018 5:18 PM, Jeff King wrote:
> On Fri, Mar 16, 2018 at 07:40:55PM +0000, git@jeffhostetler.com wrote:
>
[...]
> I really like the idea of being able to send our machine-readable output
> in some "standard" syntax for which people may already have parsers. But
> one big hangup with JSON is that it assumes all strings are UTF-8. That
> may be OK for telemetry data, but it would probably lead to problems for
> something like status porcelain, since Git's view of paths is just a
> string of bytes (not to mention possible uses elsewhere like author
> names, subject lines, etc).
[...]
I'll come back to the UTF-8/YAML questions in a separate response.
>> Documentation for the new API is given in json-writer.h at the bottom of
>> the first patch.
>
> The API generally looks pleasant, but the nesting surprised me. E.g.,
> I'd have expected:
>
> jw_array_begin(out);
> jw_array_begin(out);
> jw_array_append_int(out, 42);
> jw_array_end(out);
> jw_array_end(out);
>
> to result in an array containing an array containing an integer. But
> array_begin() actually resets the strbuf, so you can't build up nested
> items like this internally. Ditto for objects within objects. You have
> to use two separate strbufs and copy the data an extra time.
>
> To make the above work, I think you'd have to store a little more state.
> E.g., the "array_append" functions check "out->len" to see if they need
> to add a separating comma. That wouldn't work if we might be part of a
> nested array. So I think you'd need a context struct like:
>
> struct json_writer {
> int first_item;
> struct strbuf out;
> };
> #define JSON_WRITER_INIT { 1, STRBUF_INIT }
>
> to store the state and the output. As a bonus, you could also use it to
> store some other sanity checks (e.g., keep a "depth" counter and BUG()
> when somebody tries to access the finished strbuf with a hanging-open
> object or array).
Yeah, I thought about that, but I think it gets more complex than that.
I'd need a stack of "first_item" values. Or maybe the _begin() needs to
increment a depth and set first_item and the _end() needs to always
unset first_item. I'll look at this gain.
The thing I liked about the bottom-up construction is that it is easier
to collect multiple sets in parallel and combine them during the final
roll-up. With the in-line nesting, you're tempted to try to construct
the resulting JSON in a single series and that may not fit what the code
is trying to do. For example, if I wanted to collect an array of error
messages as they are generated and an array of argv arrays and any alias
expansions, then put together a final JSON string containing them and
the final exit code, I'd need to build it in parts. I can build these
parts in pieces of JSON and combine them at the end -- or build up other
similar data structures (string arrays, lists, or whatever) and then
have a JSON conversion step. But we can make it work both ways, I just
wanted to keep it simpler.
>> I wasn't sure how to unit test the API from a shell script, so I added a
>> helper command that does most of the work in the second patch.
>
> In general I'd really prefer to keep the shell script as the driver for
> the tests, and have t/helper programs just be conduits. E.g., something
> like:
>
> cat >expect <<-\EOF &&
> {"key": "value", "foo": 42}
> EOF
> test-json-writer >actual \
> object_begin \
> object_append_string key value \
> object_append_int foo 42 \
> object_end &&
> test_cmp expect actual
>
> It's a bit tedious (though fairly mechanical) to expose the API in this
> way, but it makes it much easier to debug, modify, or add tests later
> on (for example, I had to modify the C program to find out that my
> append example above wouldn't work).
Yeah, I wasn't sure if such a simple api required exposing all that
machinery to the shell or not. And the api is fairly self-contained
and not depending on a lot of disk/repo setup or anything, so my tests
would be essentially static WRT everything else.
With my t0019 script you should have been able use -x -v to see what
was failing.
>
> -Peff
>
thanks for the quick review
Jeff
next prev parent reply other threads:[~2018-03-19 10:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-16 19:40 [PATCH 0/2] routines to generate JSON data git
2018-03-16 19:40 ` [PATCH 1/2] json_writer: new routines to create data in JSON format git
2018-03-16 19:40 ` [PATCH 2/2] json-writer: unit test git
2018-03-16 21:18 ` [PATCH 0/2] routines to generate JSON data Jeff King
2018-03-16 23:00 ` Ævar Arnfjörð Bjarmason
2018-03-20 5:52 ` Jeff King
2018-03-17 7:38 ` Jacob Keller
2018-03-19 17:31 ` Jeff Hostetler
2018-03-19 10:19 ` Jeff Hostetler [this message]
2018-03-20 5:42 ` Jeff King
2018-03-20 16:44 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2a9d2a9-f2e2-f285-52c3-be11668da543@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=lars.schneider@autodesk.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).