git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Martin Langhoff <martin.langhoff@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Structured (ie: json) output for query commands?
Date: Thu, 1 Jul 2021 11:47:18 -0400	[thread overview]
Message-ID: <YN3jhlXyTEmoBOon@coredump.intra.peff.net> (raw)
In-Reply-To: <CACPiFC+F9P1DY_Dgt4+Z-U4o5WRbRduq60+Df0+FHBn6=XL2hw@mail.gmail.com>

On Wed, Jun 30, 2021 at 02:20:09PM -0400, Martin Langhoff wrote:

> > One complication we faced is that a lot of Git's data is bag-of-bytes,
> 
> Great point -- hadn't thought of that. Don't see anything in
> json-writer.c but we do use iconv already.

We do, but the problem is deeper than that. We don't always know the
intended encoding of bytes in the repository. For commits, there's an
"encoding" header and we default to utf8 if it's not specified. But
filenames in trees do not have an encoding (nor are two entries in a
single tree even required to be in the same encoding). They really are
just sequences of NUL-terminated binary bytes from Git's perspective.

Most of the time that just works, of course. People tend to use utf8
these days anyway. And even if they aren't utf8, as long as the user's
terminal is configured to match, then everything will look OK to them
(you do have to turn off core.quotepath to see any high-bit characters
in filenames).

So in practice I suspect it is fine to just output them as-is in json.
Things will Just Work for people using utf8 consistently. People using
other encodings will have things look OK in their terminal, but probably
JSON parsers would choke. We could provide an option to say "when you
generate json, assume paths are in encoding XYZ (say, latin1) and
convert to utf8". That wouldn't help people who have mix-and-match
encodings in their trees, but that seems even more rare.

-Peff

  reply	other threads:[~2021-07-01 15:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACPiFC++fG-WL8uvTkiydf3wD8TY6dStVpuLcKA9cX_EnwoHGA@mail.gmail.com>
2021-06-30 17:00 ` Structured (ie: json) output for query commands? Martin Langhoff
2021-06-30 17:59   ` Jeff King
2021-06-30 18:20     ` Martin Langhoff
2021-07-01 15:47       ` Jeff King [this message]
2021-06-30 20:19     ` brian m. carlson
2021-06-30 23:27       ` Martin Langhoff
2021-07-01 16:00       ` Jeff King
2021-07-01 21:18         ` brian m. carlson
2021-07-01 21:48           ` Jeff King
2021-07-02 13:13           ` Ævar Arnfjörð Bjarmason
2021-07-01  8:18   ` Han-Wen Nienhuys

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YN3jhlXyTEmoBOon@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).