git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2] routines to generate JSON data
@ 2018-03-21 19:28 git
  2018-03-21 19:28 ` [PATCH v2] json_writer: new routines to create data in JSON format git
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: git @ 2018-03-21 19:28 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, avarab, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

This is version 2 of my JSON data format routines.  This version addresses
the non-utf8 questions raised on V1.

It includes a new "struct json_writer" which is used to guide the
accumulation of JSON data -- knowing whether an object or array is
currently being composed.  This allows error checking during construction.

It also allows construction of nested structures using an inline model (in
addition to the original bottom-up composition).

The test helper has been updated to include both the original unit tests and
a new scripting API to allow individual tests to be written directly in our
t/t*.sh shell scripts.


TODO
====

I still don't know what to do about the Unicode/UTF-8 questions that
were raised WRT strings.  Pathnames on Linux can be any sequence of 8bit
characters -- this is likely to be UTF-8 on modern systems.  Pathnames on
Windows are UCS2/UTF-16 in the filesystem and we always convert to/from
UTF-8 when moving between git data structures and IO calls.

There are few other fields (like author name) that we may want to log which
may or may not be, but that is beyond our control.  Even localized error
messages may be problematic if they include other fields.

So, I'm not sure we have a route to get UTF-8-clean data out of Git, and if
we do it is beyond the scope of this patch series.

So I think for our uses here, defining this as "JSON-like" is probably the
best answer.  We write the strings as we received them (from the file system,
the index, or whatever).  These strings are properly escaped WRT double
quotes, backslashes, and control characters, so we shouldn't have an issue
with decoders getting out of sync -- only with them rejecting non-UTF-8
sequences.

We could blindly \uXXXX encode each of the hi-bit characters, if that would
help the parsers, but I don't want to do that right now.

WRT binary data, I had not intended using this for binary data.  And without
knowing what kinds or quantity of binary data we might use it for, I'd like
to ignore this for now.


Jeff Hostetler (1):
  json_writer: new routines to create data in JSON format

 Makefile                    |   2 +
 json-writer.c               | 321 +++++++++++++++++++++++++++++++++
 json-writer.h               |  86 +++++++++
 t/helper/test-json-writer.c | 420 ++++++++++++++++++++++++++++++++++++++++++++
 t/t0019-json-writer.sh      | 102 +++++++++++
 5 files changed, 931 insertions(+)
 create mode 100644 json-writer.c
 create mode 100644 json-writer.h
 create mode 100644 t/helper/test-json-writer.c
 create mode 100755 t/t0019-json-writer.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-23 20:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-21 19:28 [PATCH v2] routines to generate JSON data git
2018-03-21 19:28 ` [PATCH v2] json_writer: new routines to create data in JSON format git
2018-03-21 21:25   ` Junio C Hamano
2018-03-23 14:13     ` Jeff Hostetler
2018-03-23 18:01   ` René Scharfe
2018-03-23 19:55     ` Jeff Hostetler
2018-03-23 20:11       ` René Scharfe
2018-03-23 20:32         ` Jeff Hostetler
2018-03-22  6:05 ` [PATCH v2] routines to generate JSON data Jeff King
2018-03-22  8:38 ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).