From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Jeff King <peff@peff.net>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Structured (ie: json) output for query commands?
Date: Thu, 1 Jul 2021 21:18:01 +0000 [thread overview]
Message-ID: <YN4xCRDi3JwMc+S0@camp.crustytoothpaste.net> (raw)
In-Reply-To: <YN3mk0LnyJyuQ+9T@coredump.intra.peff.net>
[-- Attachment #1: Type: text/plain, Size: 3340 bytes --]
On 2021-07-01 at 16:00:19, Jeff King wrote:
> On Wed, Jun 30, 2021 at 08:19:49PM +0000, brian m. carlson wrote:
>
> > On 2021-06-30 at 17:59:43, Jeff King wrote:
> > > One complication we faced is that a lot of Git's data is bag-of-bytes,
> > > not utf8. And json technically requires utf8. I don't remember if we
> > > simply fudged that and output possibly non-utf8 sequences, or if we
> > > actually encode them.
> >
> > I think we just emit invalid UTF-8 in that case, which is a problem.
> > That's why Git is not well suited to JSON output and why it isn't a good
> > choice for structured data here. I'd like us not to do more JSON in our
> > codebase, since it's practically impossible for users to depend on our
> > output if we do that due to encoding issues[0].
> >
> > We could emit data in a different format, such as YAML, which does have
> > encoding for arbitrary byte sequences. However, in YAML, binary data is
> > always base64 encoded, which is less readable, although still
> > interchangeable. CBOR is also a possibility, although it's not human
> > readable at all.
>
> I don't love the invalid-utf8-in-json thing in general. But I think it
> may be the least-bad solution. I seem to recall that YAML has its own
> complexities, and losing human-readability (even to base64) is a pretty
> big downside. And the tooling for working with json seems more common
> and mature (certainly over something like CBOR, but I think even YAML
> doesn't have anything nearly as nice as jq).
I'm not opposed to JSON as long as we don't write landmines. We could
URI-encode anything that contains a bag-of-bytes, which lets people have
the niceties of JSON without the breakage when people don't write valid
UTF-8. Most things will still be human-readable.
We could even have --json be an alias for --json=encoded (URI-encoding)
and also have --json=strict for the situation where you assert
everything is valid UTF-8 and explicitly said you wanted us to die() if
we saw non-UTF-8. I don't want us to say that something is JSON and
then emit junk, since that's a bad user experience.
Ideally, we'd have some generic serializer support for this case, so if
people _do_ want to add YAML or CBOR output, it can be stuffed in.
> Our sloppy json encoding does work correctly if you use utf8 paths, and
> I think we could provide options to cover other common cases (e.g., a
> single option for "assume my paths are latin1"). I think life is hardest
> on somebody writing a script/service which is meant to process arbitrary
> repositories (and isn't in control of the strictness of whatever is
> parsing the json).
I think I'd rather provide a general encoding functionality than try to
handle random encodings. I _do_ want people to be able to do things
like store arbitrary bytes in paths, because many people do use that
functionality for shipping test files that verify their code works
correctly on Unix systems. I also want us to handle arbitrary bytes
where we've stated that's a thing we support (e.g., in refs). I _don't_
want to encourage people to use non-UTF-8 text encodings, because I
firmly believe those are obsolete.
So, correct binary data support, yes; non-UTF-8 text, no.
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2021-07-01 21:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CACPiFC++fG-WL8uvTkiydf3wD8TY6dStVpuLcKA9cX_EnwoHGA@mail.gmail.com>
2021-06-30 17:00 ` Structured (ie: json) output for query commands? Martin Langhoff
2021-06-30 17:59 ` Jeff King
2021-06-30 18:20 ` Martin Langhoff
2021-07-01 15:47 ` Jeff King
2021-06-30 20:19 ` brian m. carlson
2021-06-30 23:27 ` Martin Langhoff
2021-07-01 16:00 ` Jeff King
2021-07-01 21:18 ` brian m. carlson [this message]
2021-07-01 21:48 ` Jeff King
2021-07-02 13:13 ` Ævar Arnfjörð Bjarmason
2021-07-01 8:18 ` Han-Wen Nienhuys
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YN4xCRDi3JwMc+S0@camp.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=git@vger.kernel.org \
--cc=martin.langhoff@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).