git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] Add 'ls-files --json' to dump the index in json
@ 2019-06-19  9:58 Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 1/8] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
                   ` (12 more replies)
  0 siblings, 13 replies; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This is probably just my itch. Every time I have to do something with
the index, I need to add a little bit code here, a little bit there to
get a better "view" of the index.

This solves it for me. It allows me to see pretty much everything in the
index (except really low detail stuff like pathname compression). It's
readable by human, but also easy to parse if you need to do statistics
and stuff. You could even do a "diff" between two indexes.

I'm not really sure if anybody else finds this useful. Because if not,
I guess there's not much point trying to merge it to git.git just for a
single user. Maintaining off tree is still a pain for me, but I think
I can manage it.

Nguyễn Thái Ngọc Duy (8):
  ls-files: add --json to dump the index
  split-index.c: dump "link" extension as json
  fsmonitor.c: dump "FSMN" extension as json
  resolve-undo.c: dump "REUC" extension as json
  read-cache.c: dump "EOIE" extension as json
  read-cache.c: dump "IEOT" extension as json
  cache-tree.c: dump "TREE" extension as json
  dir.c: dump "UNTR" extension as json

 Documentation/git-ls-files.txt |   5 ++
 builtin/ls-files.c             |  30 +++++--
 cache-tree.c                   |  41 ++++++++--
 cache-tree.h                   |   5 +-
 cache.h                        |   2 +
 dir.c                          |  56 ++++++++++++-
 dir.h                          |   4 +-
 fsmonitor.c                    |   9 +++
 json-writer.c                  |  30 +++++++
 json-writer.h                  |  29 +++++++
 read-cache.c                   | 139 ++++++++++++++++++++++++++++++---
 resolve-undo.c                 |  36 ++++++++-
 resolve-undo.h                 |   4 +-
 split-index.c                  |  13 ++-
 14 files changed, 376 insertions(+), 27 deletions(-)

-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/8] ls-files: add --json to dump the index
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19 10:30   ` Ævar Arnfjörð Bjarmason
  2019-06-19 13:03   ` Derrick Stolee
  2019-06-19  9:58 ` [PATCH 2/8] split-index.c: dump "link" extension as json Nguyễn Thái Ngọc Duy
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

So far we don't have a command to basically dump the index file out,
with all its glory details. Checking some info, for example, stat
time, usually involves either writing new code or firing up "xxd" and
decoding values by yourself.

This --json is supposed to help that. It dumps the index in a human
readable format but also easy to be processed with tools. And it will
print almost enough info to reconstruct the index later.

In this patch we only dump the main part, not extensions. But at the
end of the series, the entire index is dumped. The end result could be
very verbose even on a small repository such as git.git.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 30 +++++++++++---
 cache.h                        |  2 +
 json-writer.c                  | 16 ++++++++
 json-writer.h                  | 21 ++++++++++
 read-cache.c                   | 73 +++++++++++++++++++++++++++++++++-
 6 files changed, 140 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 8461c0e83e..54011c8f65 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -60,6 +60,11 @@ OPTIONS
 --stage::
 	Show staged contents' mode bits, object name and stage number in the output.
 
+--json::
+	Dump the entire index content in JSON format. This is for
+	debugging purposes and the JSON structure may change from time
+	to time.
+
 --directory::
 	If a whole directory is classified as "other", show just its
 	name (with a trailing slash) and not its whole contents.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f83c9a6f2..d00f6d3074 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "repository.h"
 #include "config.h"
+#include "json-writer.h"
 #include "quote.h"
 #include "dir.h"
 #include "builtin.h"
@@ -31,6 +32,7 @@ static int show_modified;
 static int show_killed;
 static int show_valid_bit;
 static int show_fsmonitor_bit;
+static int show_json;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
@@ -543,6 +545,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("show staged contents' object name in the output")),
 		OPT_BOOL('k', "killed", &show_killed,
 			N_("show files on the filesystem that need to be removed")),
+		OPT_BOOL(0, "json", &show_json,
+			N_("dump index content in json format")),
 		OPT_BIT(0, "directory", &dir.flags,
 			N_("show 'other' directories' names only"),
 			DIR_SHOW_OTHER_DIRECTORIES),
@@ -660,8 +664,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 
 	/* With no flags, we default to showing the cached files */
 	if (!(show_stage || show_deleted || show_others || show_unmerged ||
-	      show_killed || show_modified || show_resolve_undo))
+	      show_killed || show_modified || show_resolve_undo || show_json))
 		show_cached = 1;
+	if (show_json && (show_stage || show_deleted || show_others ||
+			  show_unmerged || show_killed || show_modified ||
+			  show_cached || show_resolve_undo || with_tree))
+		die(_("--show-json cannot be used with other --show- options, or --with-tree"));
 
 	if (with_tree) {
 		/*
@@ -673,10 +681,22 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		overlay_tree_on_index(the_repository->index, with_tree, max_prefix);
 	}
 
-	show_files(the_repository, &dir);
-
-	if (show_resolve_undo)
-		show_ru_info(the_repository->index);
+	if (!show_json) {
+		show_files(the_repository, &dir);
+
+		if (show_resolve_undo)
+			show_ru_info(the_repository->index);
+	} else {
+		struct json_writer jw = JSON_WRITER_INIT;
+
+		discard_index(the_repository->index);
+		the_repository->index->jw = &jw;
+		if (repo_read_index(the_repository) < 0)
+			die("index file corrupt");
+		puts(jw.json.buf);
+		the_repository->index->jw = NULL;
+		jw_release(&jw);
+	}
 
 	if (ps_matched) {
 		int bad;
diff --git a/cache.h b/cache.h
index bf20337ef4..84d0aeed20 100644
--- a/cache.h
+++ b/cache.h
@@ -326,6 +326,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define UNTRACKED_CHANGED	(1 << 7)
 #define FSMONITOR_CHANGED	(1 << 8)
 
+struct json_writer;
 struct split_index;
 struct untracked_cache;
 
@@ -350,6 +351,7 @@ struct index_state {
 	uint64_t fsmonitor_last_update;
 	struct ewah_bitmap *fsmonitor_dirty;
 	struct mem_pool *ce_mem_pool;
+	struct json_writer *jw;
 };
 
 /* Name hashing */
diff --git a/json-writer.c b/json-writer.c
index aadb9dbddc..281bc50b39 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -202,6 +202,22 @@ void jw_object_null(struct json_writer *jw, const char *key)
 	strbuf_addstr(&jw->json, "null");
 }
 
+void jw_object_stat_data(struct json_writer *jw, const char *name,
+			 const struct stat_data *sd)
+{
+	jw_object_inline_begin_object(jw, name);
+	jw_object_intmax(jw, "st_ctime.sec", sd->sd_ctime.sec);
+	jw_object_intmax(jw, "st_ctime.nsec", sd->sd_ctime.nsec);
+	jw_object_intmax(jw, "st_mtime.sec", sd->sd_mtime.sec);
+	jw_object_intmax(jw, "st_mtime.nsec", sd->sd_mtime.nsec);
+	jw_object_intmax(jw, "st_dev", sd->sd_dev);
+	jw_object_intmax(jw, "st_ino", sd->sd_ino);
+	jw_object_intmax(jw, "st_uid", sd->sd_uid);
+	jw_object_intmax(jw, "st_gid", sd->sd_gid);
+	jw_object_intmax(jw, "st_size", sd->sd_size);
+	jw_end(jw);
+}
+
 static void increase_indent(struct strbuf *sb,
 			    const struct json_writer *jw,
 			    int indent)
diff --git a/json-writer.h b/json-writer.h
index 83906b09c1..38f9c9bf68 100644
--- a/json-writer.h
+++ b/json-writer.h
@@ -44,6 +44,8 @@
 
 #include "strbuf.h"
 
+struct stat_data;
+
 struct json_writer
 {
 	/*
@@ -81,6 +83,8 @@ void jw_object_true(struct json_writer *jw, const char *key);
 void jw_object_false(struct json_writer *jw, const char *key);
 void jw_object_bool(struct json_writer *jw, const char *key, int value);
 void jw_object_null(struct json_writer *jw, const char *key);
+void jw_object_stat_data(struct json_writer *jw, const char *key,
+			 const struct stat_data *sd);
 void jw_object_sub_jw(struct json_writer *jw, const char *key,
 		      const struct json_writer *value);
 
@@ -104,4 +108,21 @@ void jw_array_inline_begin_array(struct json_writer *jw);
 int jw_is_terminated(const struct json_writer *jw);
 void jw_end(struct json_writer *jw);
 
+/*
+ * These _gently versions accept NULL json_writer to reduce too much
+ * branching at the call site.
+ */
+static inline void jw_object_inline_begin_array_gently(struct json_writer *jw,
+						       const char *name)
+{
+	if (jw)
+		jw_object_inline_begin_array(jw, name);
+}
+
+static inline void jw_end_gently(struct json_writer *jw)
+{
+	if (jw)
+		jw_end(jw);
+}
+
 #endif /* JSON_WRITER_H */
diff --git a/read-cache.c b/read-cache.c
index 4dd22f4f6e..eec030b3bb 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -25,6 +25,7 @@
 #include "fsmonitor.h"
 #include "thread-utils.h"
 #include "progress.h"
+#include "json-writer.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1952,6 +1953,50 @@ static void *load_index_extensions(void *_data)
 	return NULL;
 }
 
+static void dump_cache_entry(struct index_state *istate,
+			     int index,
+			     unsigned long offset,
+			     const struct cache_entry *ce)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct json_writer *jw = istate->jw;
+
+	jw_array_inline_begin_object(jw);
+
+	/*
+	 * this is technically redundant, but it's for easier
+	 * navigation when there hundreds of entries
+	 */
+	jw_object_intmax(jw, "id", index);
+
+	jw_object_string(jw, "name", ce->name);
+
+	strbuf_addf(&sb, "%06o", ce->ce_mode);
+	jw_object_string(jw, "mode", sb.buf);
+	strbuf_release(&sb);
+
+	jw_object_intmax(jw, "flags", ce->ce_flags);
+	/*
+	 * again redundant info, just so you don't have to decode
+	 * flags values manually
+	 */
+	if (ce->ce_flags & CE_VALID)
+		jw_object_true(jw, "assume-unchanged");
+	if (ce->ce_flags & CE_INTENT_TO_ADD)
+		jw_object_true(jw, "intent-to-add");
+	if (ce->ce_flags & CE_SKIP_WORKTREE)
+		jw_object_true(jw, "skip-worktree");
+	if (ce_stage(ce))
+		jw_object_intmax(jw, "stage", ce_stage(ce));
+
+	jw_object_string(jw, "oid", oid_to_hex(&ce->oid));
+
+	jw_object_stat_data(jw, "stat", &ce->ce_stat_data);
+	jw_object_intmax(jw, "file-offset", offset);
+
+	jw_end(jw);
+}
+
 /*
  * A helper function that will load the specified range of cache entries
  * from the memory mapped file and add them to the given index.
@@ -1972,6 +2017,9 @@ static unsigned long load_cache_entry_block(struct index_state *istate,
 		ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, &consumed, previous_ce);
 		set_index_entry(istate, i, ce);
 
+		if (istate->jw)
+			dump_cache_entry(istate, i, src_offset, ce);
+
 		src_offset += consumed;
 		previous_ce = ce;
 	}
@@ -1983,6 +2031,8 @@ static unsigned long load_all_cache_entries(struct index_state *istate,
 {
 	unsigned long consumed;
 
+	jw_object_inline_begin_array_gently(istate->jw, "entries");
+
 	if (istate->version == 4) {
 		mem_pool_init(&istate->ce_mem_pool,
 				estimate_cache_size_from_compressed(istate->cache_nr));
@@ -1993,6 +2043,8 @@ static unsigned long load_all_cache_entries(struct index_state *istate,
 
 	consumed = load_cache_entry_block(istate, istate->ce_mem_pool,
 					0, istate->cache_nr, mmap, src_offset, NULL);
+
+	jw_end_gently(istate->jw);
 	return consumed;
 }
 
@@ -2120,6 +2172,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	size_t extension_offset = 0;
 	int nr_threads, cpus;
 	struct index_entry_offset_table *ieot = NULL;
+	int jw_pretty = 1;
 
 	if (istate->initialized)
 		return istate->cache_nr;
@@ -2154,6 +2207,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	istate->cache_nr = ntohl(hdr->hdr_entries);
 	istate->cache_alloc = alloc_nr(istate->cache_nr);
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
+	istate->timestamp.sec = st.st_mtime;
+	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	istate->initialized = 1;
 
 	p.istate = istate;
@@ -2176,6 +2231,20 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (!HAVE_THREADS)
 		nr_threads = 1;
 
+	if (istate->jw) {
+		jw_object_begin(istate->jw, jw_pretty);
+		jw_object_intmax(istate->jw, "version", istate->version);
+		jw_object_string(istate->jw, "oid", oid_to_hex(&istate->oid));
+		jw_object_intmax(istate->jw, "st_mtime.sec", istate->timestamp.sec);
+		jw_object_intmax(istate->jw, "st_mtime.nsec", istate->timestamp.nsec);
+
+		/*
+		 * Threading may mess up json writing. This is for
+		 * debugging only, so performance is not a concern.
+		 */
+		nr_threads = 1;
+	}
+
 	if (nr_threads > 1) {
 		extension_offset = read_eoie_extension(mmap, mmap_size);
 		if (extension_offset) {
@@ -2204,8 +2273,6 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset);
 	}
 
-	istate->timestamp.sec = st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 
 	/* if we created a thread, join it otherwise load the extensions on the primary thread */
 	if (extension_offset) {
@@ -2216,6 +2283,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		p.src_offset = src_offset;
 		load_index_extensions(&p);
 	}
+	jw_end_gently(istate->jw);
+
 	munmap((void *)mmap, mmap_size);
 
 	/*
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/8] split-index.c: dump "link" extension as json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 1/8] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 3/8] fsmonitor.c: dump "FSMN" " Nguyễn Thái Ngọc Duy
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 json-writer.c | 14 ++++++++++++++
 json-writer.h |  2 ++
 split-index.c | 13 ++++++++++++-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/json-writer.c b/json-writer.c
index 281bc50b39..70403580ca 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "ewah/ewok.h"
 #include "json-writer.h"
 
 void jw_init(struct json_writer *jw)
@@ -218,6 +219,19 @@ void jw_object_stat_data(struct json_writer *jw, const char *name,
 	jw_end(jw);
 }
 
+static void dump_ewah_one(size_t pos, void *jw)
+{
+	jw_array_intmax(jw, pos);
+}
+
+void jw_object_ewah(struct json_writer *jw, const char *key,
+		    struct ewah_bitmap *ewah)
+{
+	jw_object_inline_begin_array(jw, key);
+	ewah_each_bit(ewah, dump_ewah_one, jw);
+	jw_end(jw);
+}
+
 static void increase_indent(struct strbuf *sb,
 			    const struct json_writer *jw,
 			    int indent)
diff --git a/json-writer.h b/json-writer.h
index 38f9c9bf68..3c173647d3 100644
--- a/json-writer.h
+++ b/json-writer.h
@@ -85,6 +85,8 @@ void jw_object_bool(struct json_writer *jw, const char *key, int value);
 void jw_object_null(struct json_writer *jw, const char *key);
 void jw_object_stat_data(struct json_writer *jw, const char *key,
 			 const struct stat_data *sd);
+void jw_object_ewah(struct json_writer *jw, const char *key,
+		    struct ewah_bitmap *ewah);
 void jw_object_sub_jw(struct json_writer *jw, const char *key,
 		      const struct json_writer *value);
 
diff --git a/split-index.c b/split-index.c
index e6154e4ea9..d7b4420c92 100644
--- a/split-index.c
+++ b/split-index.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "json-writer.h"
 #include "split-index.h"
 #include "ewah/ewok.h"
 
@@ -16,6 +17,7 @@ int read_link_extension(struct index_state *istate,
 {
 	const unsigned char *data = data_;
 	struct split_index *si;
+	unsigned long original_sz = sz;
 	int ret;
 
 	if (sz < the_hash_algo->rawsz)
@@ -25,7 +27,7 @@ int read_link_extension(struct index_state *istate,
 	data += the_hash_algo->rawsz;
 	sz -= the_hash_algo->rawsz;
 	if (!sz)
-		return 0;
+		goto done;
 	si->delete_bitmap = ewah_new();
 	ret = ewah_read_mmap(si->delete_bitmap, data, sz);
 	if (ret < 0)
@@ -38,6 +40,15 @@ int read_link_extension(struct index_state *istate,
 		return error("corrupt replace bitmap in link extension");
 	if (ret != sz)
 		return error("garbage at the end of link extension");
+done:
+	if (istate->jw) {
+		jw_object_inline_begin_object(istate->jw, "split-index");
+		jw_object_string(istate->jw, "oid", oid_to_hex(&si->base_oid));
+		jw_object_ewah(istate->jw, "delete-bitmap", si->delete_bitmap);
+		jw_object_ewah(istate->jw, "replace-bitmap", si->replace_bitmap);
+		jw_object_intmax(istate->jw, "ext-size", original_sz);
+		jw_end(istate->jw);
+	}
 	return 0;
 }
 
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/8] fsmonitor.c: dump "FSMN" extension as json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 1/8] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 2/8] split-index.c: dump "link" extension as json Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 4/8] resolve-undo.c: dump "REUC" " Nguyễn Thái Ngọc Duy
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 fsmonitor.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fsmonitor.c b/fsmonitor.c
index 1dee0aded1..f6ba437255 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -3,6 +3,7 @@
 #include "dir.h"
 #include "ewah/ewok.h"
 #include "fsmonitor.h"
+#include "json-writer.h"
 #include "run-command.h"
 #include "strbuf.h"
 
@@ -50,6 +51,14 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
 	}
 	istate->fsmonitor_dirty = fsmonitor_dirty;
 
+	if (istate->jw) {
+		jw_object_inline_begin_object(istate->jw, "fsmonitor");
+		jw_object_intmax(istate->jw, "version", hdr_version);
+		jw_object_intmax(istate->jw, "last-update", istate->fsmonitor_last_update);
+		jw_object_ewah(istate->jw, "dirty", fsmonitor_dirty);
+		jw_object_intmax(istate->jw, "ext-size", sz);
+		jw_end(istate->jw);
+	}
 	trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful");
 	return 0;
 }
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/8] resolve-undo.c: dump "REUC" extension as json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2019-06-19  9:58 ` [PATCH 3/8] fsmonitor.c: dump "FSMN" " Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19 13:16   ` Derrick Stolee
  2019-06-19  9:58 ` [PATCH 5/8] read-cache.c: dump "EOIE" " Nguyễn Thái Ngọc Duy
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c   |  2 +-
 resolve-undo.c | 36 +++++++++++++++++++++++++++++++++++-
 resolve-undo.h |  4 +++-
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index eec030b3bb..3b5c63f53a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1701,7 +1701,7 @@ static int read_index_extension(struct index_state *istate,
 		istate->cache_tree = cache_tree_read(data, sz);
 		break;
 	case CACHE_EXT_RESOLVE_UNDO:
-		istate->resolve_undo = resolve_undo_read(data, sz);
+		istate->resolve_undo = resolve_undo_read(data, sz, istate->jw);
 		break;
 	case CACHE_EXT_LINK:
 		if (read_link_extension(istate, data, sz))
diff --git a/resolve-undo.c b/resolve-undo.c
index 236320f179..999020bc40 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "dir.h"
+#include "json-writer.h"
 #include "resolve-undo.h"
 #include "string-list.h"
 
@@ -49,7 +50,8 @@ void resolve_undo_write(struct strbuf *sb, struct string_list *resolve_undo)
 	}
 }
 
-struct string_list *resolve_undo_read(const char *data, unsigned long size)
+struct string_list *resolve_undo_read(const char *data, unsigned long size,
+				      struct json_writer *jw)
 {
 	struct string_list *resolve_undo;
 	size_t len;
@@ -59,6 +61,11 @@ struct string_list *resolve_undo_read(const char *data, unsigned long size)
 
 	resolve_undo = xcalloc(1, sizeof(*resolve_undo));
 	resolve_undo->strdup_strings = 1;
+	if (jw) {
+		jw_object_inline_begin_object(jw, "resolve-undo");
+		jw_object_intmax(jw, "ext-size", size);
+		jw_object_inline_begin_array(jw, "entries");
+	}
 
 	while (size) {
 		struct string_list_item *lost;
@@ -94,6 +101,33 @@ struct string_list *resolve_undo_read(const char *data, unsigned long size)
 			size -= rawsz;
 			data += rawsz;
 		}
+
+		if (jw) {
+			struct strbuf sb = STRBUF_INIT;
+
+			jw_array_inline_begin_object(jw);
+			jw_object_string(jw, "path", lost->string);
+
+			jw_object_inline_begin_array(jw, "mode");
+			for (i = 0; i < 3; i++) {
+				strbuf_addf(&sb, "%06o", ui->mode[i]);
+				jw_array_string(jw, sb.buf);
+				strbuf_reset(&sb);
+			}
+			jw_end(jw);
+
+			jw_object_inline_begin_array(jw, "oid");
+			for (i = 0; i < 3; i++)
+				jw_array_string(jw, oid_to_hex(&ui->oid[i]));
+			jw_end(jw);
+
+			jw_end(jw);
+			strbuf_release(&sb);
+		}
+	}
+	if (jw) {
+		jw_end(jw);	/* entries */
+		jw_end(jw);	/* resolve-undo */
 	}
 	return resolve_undo;
 
diff --git a/resolve-undo.h b/resolve-undo.h
index 2b3f0f901e..46b4e93a7e 100644
--- a/resolve-undo.h
+++ b/resolve-undo.h
@@ -3,6 +3,8 @@
 
 #include "cache.h"
 
+struct json_writer;
+
 struct resolve_undo_info {
 	unsigned int mode[3];
 	struct object_id oid[3];
@@ -10,7 +12,7 @@ struct resolve_undo_info {
 
 void record_resolve_undo(struct index_state *, struct cache_entry *);
 void resolve_undo_write(struct strbuf *, struct string_list *);
-struct string_list *resolve_undo_read(const char *, unsigned long);
+struct string_list *resolve_undo_read(const char *, unsigned long, struct json_writer *);
 void resolve_undo_clear_index(struct index_state *);
 int unmerge_index_entry_at(struct index_state *, int);
 void unmerge_index(struct index_state *, const struct pathspec *);
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/8] read-cache.c: dump "EOIE" extension as json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2019-06-19  9:58 ` [PATCH 4/8] resolve-undo.c: dump "REUC" " Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 6/8] read-cache.c: dump "IEOT" " Nguyễn Thái Ngọc Duy
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 3b5c63f53a..04863c3853 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1914,7 +1914,7 @@ struct index_entry_offset_table
 static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset);
 static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot);
 
-static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
+static size_t read_eoie_extension(const char *mmap, size_t mmap_size, struct json_writer *jw);
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset);
 
 struct load_index_extensions
@@ -2243,10 +2243,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		 * debugging only, so performance is not a concern.
 		 */
 		nr_threads = 1;
+		/* and dump EOIE extension even with threading off */
+		read_eoie_extension(mmap, mmap_size, istate->jw);
 	}
 
 	if (nr_threads > 1) {
-		extension_offset = read_eoie_extension(mmap, mmap_size);
+		extension_offset = read_eoie_extension(mmap, mmap_size, NULL);
 		if (extension_offset) {
 			int err;
 
@@ -3504,7 +3506,8 @@ int should_validate_cache_entries(void)
 #define EOIE_SIZE (4 + GIT_SHA1_RAWSZ) /* <4-byte offset> + <20-byte hash> */
 #define EOIE_SIZE_WITH_HEADER (4 + 4 + EOIE_SIZE) /* <4-byte signature> + <4-byte length> + EOIE_SIZE */
 
-static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
+static size_t read_eoie_extension(const char *mmap, size_t mmap_size,
+				  struct json_writer *jw)
 {
 	/*
 	 * The end of index entries (EOIE) extension is guaranteed to be last
@@ -3548,6 +3551,12 @@ static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
 		return 0;
 	index += sizeof(uint32_t);
 
+	if (jw) {
+		jw_object_inline_begin_object(jw, "end-of-index");
+		jw_object_intmax(jw, "offset", offset);
+		jw_object_intmax(jw, "ext-size", extsize);
+		jw_object_inline_begin_array(jw, "extensions");
+	}
 	/*
 	 * The hash is computed over extension types and their sizes (but not
 	 * their contents).  E.g. if we have "TREE" extension that is N-bytes
@@ -3576,9 +3585,24 @@ static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
 
 		the_hash_algo->update_fn(&c, mmap + src_offset, 8);
 
+		if (jw) {
+			char name[5];
+
+			jw_array_inline_begin_object(jw);
+			memcpy(name, mmap + src_offset, 4);
+			name[4] = '\0';
+			jw_object_string(jw, "name",  name);
+			jw_object_intmax(jw, "size", extsize);
+			jw_end(jw);
+		}
+
 		src_offset += 8;
 		src_offset += extsize;
 	}
+	if (jw) {
+		jw_end(jw);	/* extensions */
+		jw_end(jw);	/* end-of-index */
+	}
 	the_hash_algo->final_fn(hash, &c);
 	if (!hasheq(hash, (const unsigned char *)index))
 		return 0;
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/8] read-cache.c: dump "IEOT" extension as json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2019-06-19  9:58 ` [PATCH 5/8] read-cache.c: dump "EOIE" " Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19 13:18   ` Derrick Stolee
  2019-06-19  9:58 ` [PATCH 7/8] cache-tree.c: dump "TREE" " Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 04863c3853..200834e77e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1911,7 +1911,7 @@ struct index_entry_offset_table
 	struct index_entry_offset entries[FLEX_ARRAY];
 };
 
-static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset);
+static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset, struct json_writer *jw);
 static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot);
 
 static size_t read_eoie_extension(const char *mmap, size_t mmap_size, struct json_writer *jw);
@@ -2232,6 +2232,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		nr_threads = 1;
 
 	if (istate->jw) {
+		size_t off;
+
 		jw_object_begin(istate->jw, jw_pretty);
 		jw_object_intmax(istate->jw, "version", istate->version);
 		jw_object_string(istate->jw, "oid", oid_to_hex(&istate->oid));
@@ -2243,8 +2245,11 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		 * debugging only, so performance is not a concern.
 		 */
 		nr_threads = 1;
-		/* and dump EOIE extension even with threading off */
-		read_eoie_extension(mmap, mmap_size, istate->jw);
+		/* and dump EOIE/IOET extensions even with threading off */
+		off = read_eoie_extension(mmap, mmap_size, istate->jw);
+		if (off)
+			free(read_ieot_extension(mmap, mmap_size,
+						 off, istate->jw));
 	}
 
 	if (nr_threads > 1) {
@@ -2266,7 +2271,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	 * to multi-thread the reading of the cache entries.
 	 */
 	if (extension_offset && nr_threads > 1)
-		ieot = read_ieot_extension(mmap, mmap_size, extension_offset);
+		ieot = read_ieot_extension(mmap, mmap_size, extension_offset, NULL);
 
 	if (ieot) {
 		src_offset += load_cache_entries_threaded(istate, mmap, mmap_size, nr_threads, ieot);
@@ -3630,7 +3635,9 @@ static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context,
 
 #define IEOT_VERSION	(1)
 
-static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset)
+static struct index_entry_offset_table *read_ieot_extension(
+	const char *mmap, size_t mmap_size,
+	size_t offset, struct json_writer *jw)
 {
 	const char *index = NULL;
 	uint32_t extsize, ext_version;
@@ -3666,6 +3673,12 @@ static struct index_entry_offset_table *read_ieot_extension(const char *mmap, si
 		error("invalid number of IEOT entries %d", nr);
 		return NULL;
 	}
+	if (jw) {
+		jw_object_inline_begin_object(jw, "index-entry-offsets");
+		jw_object_intmax(jw, "version", ext_version);
+		jw_object_intmax(jw, "ext-size", extsize);
+		jw_object_inline_begin_array(jw, "entries");
+	}
 	ieot = xmalloc(sizeof(struct index_entry_offset_table)
 		       + (nr * sizeof(struct index_entry_offset)));
 	ieot->nr = nr;
@@ -3674,6 +3687,17 @@ static struct index_entry_offset_table *read_ieot_extension(const char *mmap, si
 		index += sizeof(uint32_t);
 		ieot->entries[i].nr = get_be32(index);
 		index += sizeof(uint32_t);
+
+		if (jw) {
+			jw_array_inline_begin_object(jw);
+			jw_object_intmax(jw, "offset", ieot->entries[i].offset);
+			jw_object_intmax(jw, "count", ieot->entries[i].nr);
+			jw_end(jw);
+		}
+	}
+	if (jw) {
+		jw_end(jw);	/* entries */
+		jw_end(jw);	/* index-entry-offsets */
 	}
 
 	return ieot;
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/8] cache-tree.c: dump "TREE" extension as json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2019-06-19  9:58 ` [PATCH 6/8] read-cache.c: dump "IEOT" " Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19  9:58 ` [PATCH 8/8] dir.c: dump "UNTR" " Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.c | 41 ++++++++++++++++++++++++++++++++++++-----
 cache-tree.h |  5 ++++-
 read-cache.c |  2 +-
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index b13bfaf71e..fc44016fe8 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -3,6 +3,7 @@
 #include "tree.h"
 #include "tree-walk.h"
 #include "cache-tree.h"
+#include "json-writer.h"
 #include "object-store.h"
 #include "replace-object.h"
 
@@ -492,7 +493,8 @@ void cache_tree_write(struct strbuf *sb, struct cache_tree *root)
 	write_one(sb, root, "", 0);
 }
 
-static struct cache_tree *read_one(const char **buffer, unsigned long *size_p)
+static struct cache_tree *read_one(const char **buffer, unsigned long *size_p,
+				   struct json_writer *jw)
 {
 	const char *buf = *buffer;
 	unsigned long size = *size_p;
@@ -546,6 +548,15 @@ static struct cache_tree *read_one(const char **buffer, unsigned long *size_p)
 			*buffer, subtree_nr);
 #endif
 
+	if (jw) {
+		if (it->entry_count >= 0) {
+			jw_object_string(jw, "oid", oid_to_hex(&it->oid));
+			jw_object_intmax(jw, "entry_count", it->entry_count);
+		} else {
+			jw_object_null(jw, "oid");
+		}
+		jw_object_inline_begin_array(jw, "subdirs");
+	}
 	/*
 	 * Just a heuristic -- we do not add directories that often but
 	 * we do not want to have to extend it immediately when we do,
@@ -559,12 +570,18 @@ static struct cache_tree *read_one(const char **buffer, unsigned long *size_p)
 		struct cache_tree_sub *subtree;
 		const char *name = buf;
 
-		sub = read_one(&buf, &size);
+		if (jw) {
+			jw_array_inline_begin_object(jw);
+			jw_object_string(jw, "name", name);
+		}
+		sub = read_one(&buf, &size, jw);
+		jw_end_gently(jw);
 		if (!sub)
 			goto free_return;
 		subtree = cache_tree_sub(it, name);
 		subtree->cache_tree = sub;
 	}
+	jw_end_gently(jw);
 	if (subtree_nr != it->subtree_nr)
 		die("cache-tree: internal error");
 	*buffer = buf;
@@ -576,11 +593,25 @@ static struct cache_tree *read_one(const char **buffer, unsigned long *size_p)
 	return NULL;
 }
 
-struct cache_tree *cache_tree_read(const char *buffer, unsigned long size)
+struct cache_tree *cache_tree_read(const char *buffer, unsigned long size,
+				   struct json_writer *jw)
 {
+	struct cache_tree *ret;
+
+	if (jw) {
+		jw_object_inline_begin_object(jw, "cache-tree");
+		jw_object_intmax(jw, "ext-size", size);
+		jw_object_inline_begin_object(jw, "root");
+	}
 	if (buffer[0])
-		return NULL; /* not the whole tree */
-	return read_one(&buffer, &size);
+		ret = NULL; /* not the whole tree */
+	else
+		ret = read_one(&buffer, &size, jw);
+	if (jw) {
+		jw_end(jw);	/* root */
+		jw_end(jw);	/* cache-tree */
+	}
+	return ret;
 }
 
 static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *path)
diff --git a/cache-tree.h b/cache-tree.h
index 757bbc48bc..fc3c73284b 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -6,6 +6,8 @@
 #include "tree-walk.h"
 
 struct cache_tree;
+struct json_writer;
+
 struct cache_tree_sub {
 	struct cache_tree *cache_tree;
 	int count;		/* internally used by update_one() */
@@ -28,7 +30,8 @@ void cache_tree_invalidate_path(struct index_state *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
-struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
+struct cache_tree *cache_tree_read(const char *buffer, unsigned long size,
+				   struct json_writer *jw);
 
 int cache_tree_fully_valid(struct cache_tree *);
 int cache_tree_update(struct index_state *, int);
diff --git a/read-cache.c b/read-cache.c
index 200834e77e..289705b816 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1698,7 +1698,7 @@ static int read_index_extension(struct index_state *istate,
 {
 	switch (CACHE_EXT(ext)) {
 	case CACHE_EXT_TREE:
-		istate->cache_tree = cache_tree_read(data, sz);
+		istate->cache_tree = cache_tree_read(data, sz, istate->jw);
 		break;
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz, istate->jw);
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 8/8] dir.c: dump "UNTR" extension as json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2019-06-19  9:58 ` [PATCH 7/8] cache-tree.c: dump "TREE" " Nguyễn Thái Ngọc Duy
@ 2019-06-19  9:58 ` Nguyễn Thái Ngọc Duy
  2019-06-19 11:58 ` [PATCH 0/8] Add 'ls-files --json' to dump the index in json Derrick Stolee
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-06-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The big part of UNTR extension is dumped at the end instead of dumping
as soon as we read it, because we actually "patch" some fields in
untracked_cache_dir with EWAH bitmaps at the end.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c         | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 dir.h         |  4 +++-
 json-writer.h |  6 ++++++
 read-cache.c  |  2 +-
 4 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index ba4a51c296..f389eee24a 100644
--- a/dir.c
+++ b/dir.c
@@ -19,6 +19,7 @@
 #include "varint.h"
 #include "ewah/ewok.h"
 #include "fsmonitor.h"
+#include "json-writer.h"
 #include "submodule-config.h"
 
 /*
@@ -2826,7 +2827,42 @@ static void load_oid_stat(struct oid_stat *oid_stat, const unsigned char *data,
 	oid_stat->valid = 1;
 }
 
-struct untracked_cache *read_untracked_extension(const void *data, unsigned long sz)
+static void jw_object_oid_stat(struct json_writer *jw, const char *key,
+			       const struct oid_stat *oid_stat)
+{
+	jw_object_inline_begin_object(jw, key);
+	jw_object_bool(jw, "valid", oid_stat->valid);
+	jw_object_string(jw, "oid", oid_to_hex(&oid_stat->oid));
+	jw_object_stat_data(jw, "stat", &oid_stat->stat);
+	jw_end(jw);
+}
+
+static void jw_object_untracked_cache_dir(struct json_writer *jw,
+					  const struct untracked_cache_dir *ucd)
+{
+	int i;
+
+	jw_object_bool(jw, "valid", ucd->valid);
+	jw_object_bool(jw, "check-only", ucd->check_only);
+	jw_object_stat_data(jw, "stat", &ucd->stat_data);
+	jw_object_string(jw, "exclude-oid", oid_to_hex(&ucd->exclude_oid));
+	jw_object_inline_begin_array(jw, "untracked");
+	for (i = 0; i < ucd->untracked_nr; i++)
+		jw_array_string(jw, ucd->untracked[i]);
+	jw_end(jw);
+
+	jw_object_inline_begin_object(jw, "dirs");
+	for (i = 0; i < ucd->dirs_nr; i++) {
+		jw_object_inline_begin_object(jw, ucd->dirs[i]->name);
+		jw_object_untracked_cache_dir(jw, ucd->dirs[i]);
+		jw_end(jw);
+	}
+	jw_end(jw);
+}
+
+struct untracked_cache *read_untracked_extension(const void *data,
+						 unsigned long sz,
+						 struct json_writer *jw)
 {
 	struct untracked_cache *uc;
 	struct read_data rd;
@@ -2864,6 +2900,17 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 	uc->dir_flags = get_be32(next + ouc_offset(dir_flags));
 	exclude_per_dir = (const char *)next + exclude_per_dir_offset;
 	uc->exclude_per_dir = xstrdup(exclude_per_dir);
+
+	if (jw) {
+		jw_object_inline_begin_object(jw, "untracked-cache");
+		jw_object_intmax(jw, "ext-size", sz);
+		jw_object_string(jw, "ident", ident);
+		jw_object_oid_stat(jw, "info/exclude", &uc->ss_info_exclude);
+		jw_object_oid_stat(jw, "excludes-file", &uc->ss_excludes_file);
+		jw_object_intmax(jw, "flags", uc->dir_flags);
+		jw_object_string(jw, "excludes-per-dir", uc->exclude_per_dir);
+	}
+
 	/* NUL after exclude_per_dir is covered by sizeof(*ouc) */
 	next += exclude_per_dir_offset + strlen(exclude_per_dir) + 1;
 	if (next >= end)
@@ -2905,6 +2952,12 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 	ewah_each_bit(rd.sha1_valid, read_oid, &rd);
 	next = rd.data;
 
+	if (jw) {
+		jw_object_inline_begin_object(jw, "root");
+		jw_object_untracked_cache_dir(jw, uc->root);
+		jw_end(jw);
+	}
+
 done:
 	free(rd.ucd);
 	ewah_free(rd.valid);
@@ -2915,6 +2968,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 		free_untracked_cache(uc);
 		uc = NULL;
 	}
+	jw_end_gently(jw);
 	return uc;
 }
 
diff --git a/dir.h b/dir.h
index 680079bbe3..80efdd05c4 100644
--- a/dir.h
+++ b/dir.h
@@ -6,6 +6,8 @@
 #include "cache.h"
 #include "strbuf.h"
 
+struct json_writer;
+
 struct dir_entry {
 	unsigned int len;
 	char name[FLEX_ARRAY]; /* more */
@@ -362,7 +364,7 @@ void untracked_cache_remove_from_index(struct index_state *, const char *);
 void untracked_cache_add_to_index(struct index_state *, const char *);
 
 void free_untracked_cache(struct untracked_cache *);
-struct untracked_cache *read_untracked_extension(const void *data, unsigned long sz);
+struct untracked_cache *read_untracked_extension(const void *data, unsigned long sz, struct json_writer *jw);
 void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 void add_untracked_cache(struct index_state *istate);
 void remove_untracked_cache(struct index_state *istate);
diff --git a/json-writer.h b/json-writer.h
index 3c173647d3..f778e019a2 100644
--- a/json-writer.h
+++ b/json-writer.h
@@ -121,6 +121,12 @@ static inline void jw_object_inline_begin_array_gently(struct json_writer *jw,
 		jw_object_inline_begin_array(jw, name);
 }
 
+static inline void jw_array_inline_begin_object_gently(struct json_writer *jw)
+{
+	if (jw)
+		jw_array_inline_begin_object(jw);
+}
+
 static inline void jw_end_gently(struct json_writer *jw)
 {
 	if (jw)
diff --git a/read-cache.c b/read-cache.c
index 289705b816..d7d9ce7260 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1708,7 +1708,7 @@ static int read_index_extension(struct index_state *istate,
 			return -1;
 		break;
 	case CACHE_EXT_UNTRACKED:
-		istate->untracked = read_untracked_extension(data, sz);
+		istate->untracked = read_untracked_extension(data, sz, istate->jw);
 		break;
 	case CACHE_EXT_FSMONITOR:
 		read_fsmonitor_extension(istate, data, sz);
-- 
2.22.0.rc0.322.g2b0371e29a


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] ls-files: add --json to dump the index
  2019-06-19  9:58 ` [PATCH 1/8] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
@ 2019-06-19 10:30   ` Ævar Arnfjörð Bjarmason
  2019-06-19 13:03   ` Derrick Stolee
  1 sibling, 0 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-06-19 10:30 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git


On Wed, Jun 19 2019, Nguyễn Thái Ngọc Duy wrote:

> +		die(_("--show-json cannot be used with other --show- options, or --with-tree"));

Should be --json, not --show-json, right? I assume --show-json is left
over from an earlier version.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (7 preceding siblings ...)
  2019-06-19  9:58 ` [PATCH 8/8] dir.c: dump "UNTR" " Nguyễn Thái Ngọc Duy
@ 2019-06-19 11:58 ` Derrick Stolee
  2019-06-19 12:42   ` Duy Nguyen
  2019-06-19 19:17 ` Jeff King
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2019-06-19 11:58 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git, kewillf

On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.

I think we (Microsoft/VFS for Git engineers) would use this tool, as we
frequently need to diagnose something that went wrong in a user's index.
Kevin Willford built a tool to search the index and figure out what's
going on, but I'm not sure it parses all of the new extensions or was
updated to parse the v5 index.

Having a translation from the internal index format to an easier-to-parse
format is valuable.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19 11:58 ` [PATCH 0/8] Add 'ls-files --json' to dump the index in json Derrick Stolee
@ 2019-06-19 12:42   ` Duy Nguyen
  2019-06-19 12:48     ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Duy Nguyen @ 2019-06-19 12:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List, Kevin Willford

On Wed, Jun 19, 2019 at 6:58 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> > This is probably just my itch. Every time I have to do something with
> > the index, I need to add a little bit code here, a little bit there to
> > get a better "view" of the index.
> >
> > This solves it for me. It allows me to see pretty much everything in the
> > index (except really low detail stuff like pathname compression). It's
> > readable by human, but also easy to parse if you need to do statistics
> > and stuff. You could even do a "diff" between two indexes.
> >
> > I'm not really sure if anybody else finds this useful. Because if not,
> > I guess there's not much point trying to merge it to git.git just for a
> > single user. Maintaining off tree is still a pain for me, but I think
> > I can manage it.
>
> I think we (Microsoft/VFS for Git engineers) would use this tool, as we
> frequently need to diagnose something that went wrong in a user's index.
> Kevin Willford built a tool to search the index and figure out what's
> going on, but I'm not sure it parses all of the new extensions or was
> updated to parse the v5 index.

OK I suggest you try it out and see if it really fits your internal
tools. I wanted to balance between manual inspection and automation so
the output may not be the best for tools. I also try not to freeze the
format for more wiggle room, which would be fine for one-time scripts,
but if you want to have real tools depend on it, we may have to look
harder at the output format and make sure it's good enough for some
time, and have some documentation.

Also, I don't suppose it matters, but just for the record I don't care
at all about --json performance. I suppose Jeff's json writer does not
cache the entire json output in memory, so dumping giant index files
is fine. But some other things, like reading the index with multiple
threads, are also disabled.
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19 12:42   ` Duy Nguyen
@ 2019-06-19 12:48     ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2019-06-19 12:48 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Kevin Willford

On 6/19/2019 8:42 AM, Duy Nguyen wrote:
> On Wed, Jun 19, 2019 at 6:58 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
>>> This is probably just my itch. Every time I have to do something with
>>> the index, I need to add a little bit code here, a little bit there to
>>> get a better "view" of the index.
>>>
>>> This solves it for me. It allows me to see pretty much everything in the
>>> index (except really low detail stuff like pathname compression). It's
>>> readable by human, but also easy to parse if you need to do statistics
>>> and stuff. You could even do a "diff" between two indexes.
>>>
>>> I'm not really sure if anybody else finds this useful. Because if not,
>>> I guess there's not much point trying to merge it to git.git just for a
>>> single user. Maintaining off tree is still a pain for me, but I think
>>> I can manage it.
>>
>> I think we (Microsoft/VFS for Git engineers) would use this tool, as we
>> frequently need to diagnose something that went wrong in a user's index.
>> Kevin Willford built a tool to search the index and figure out what's
>> going on, but I'm not sure it parses all of the new extensions or was
>> updated to parse the v5 index.
> 
> OK I suggest you try it out and see if it really fits your internal
> tools. I wanted to balance between manual inspection and automation so
> the output may not be the best for tools. I also try not to freeze the
> format for more wiggle room, which would be fine for one-time scripts,
> but if you want to have real tools depend on it, we may have to look
> harder at the output format and make sure it's good enough for some
> time, and have some documentation.
> 
> Also, I don't suppose it matters, but just for the record I don't care
> at all about --json performance. I suppose Jeff's json writer does not
> cache the entire json output in memory, so dumping giant index files
> is fine. But some other things, like reading the index with multiple
> threads, are also disabled.

Performance is not critical here, and in fact would become slower for
sure because of the extra parsing details. However, I think using JSON
as a translation layer will make any tools that consume the JSON be
more resilient to future index format updates. That stability is
valuable. Even though the JSON format is not guaranteed to stay the
same, it is easier to update an object model to the JSON format than
a new binary parser.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] ls-files: add --json to dump the index
  2019-06-19  9:58 ` [PATCH 1/8] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
  2019-06-19 10:30   ` Ævar Arnfjörð Bjarmason
@ 2019-06-19 13:03   ` Derrick Stolee
  2019-06-21 13:04     ` Johannes Schindelin
  2019-06-24 12:50     ` Duy Nguyen
  1 sibling, 2 replies; 36+ messages in thread
From: Derrick Stolee @ 2019-06-19 13:03 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git

On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> So far we don't have a command to basically dump the index file out,
> with all its glory details. Checking some info, for example, stat
> time, usually involves either writing new code or firing up "xxd" and
> decoding values by yourself.
> 
> This --json is supposed to help that. It dumps the index in a human
> readable format but also easy to be processed with tools. And it will
> print almost enough info to reconstruct the index later.

In an earlier message, I stated that I like the idea of this feature.
I know that you wanted to get that feedback before working too hard on
the patch series, so now that interest is declared, please add some tests
that verify this output looks as you expect.

> In this patch we only dump the main part, not extensions. But at the
> end of the series, the entire index is dumped. The end result could be
> very verbose even on a small repository such as git.git.

I would expect this commit to include a complete "output matches expected"
test, and the later patches can update the index to include these
extensions then verify that their sections appear in the output using
a grep.

> +--json::
> +	Dump the entire index content in JSON format. This is for
> +	debugging purposes and the JSON structure may change from time
> +	to time.
> +

"...purposes and the JSON structure may change from time to time" may be better
as "...purposes. The JSON structure is subject to change."

> +		OPT_BOOL(0, "json", &show_json,
> +			N_("dump index content in json format")),

We should probably use "JSON" here in the help text.

> -	show_files(the_repository, &dir);
> -
> -	if (show_resolve_undo)
> -		show_ru_info(the_repository->index);
> +	if (!show_json) {
> +		show_files(the_repository, &dir);
> +
> +		if (show_resolve_undo)
> +			show_ru_info(the_repository->index);
> +	} else {
> +		struct json_writer jw = JSON_WRITER_INIT;
> +
> +		discard_index(the_repository->index);
> +		the_repository->index->jw = &jw;
> +		if (repo_read_index(the_repository) < 0)
> +			die("index file corrupt");
> +		puts(jw.json.buf);
> +		the_repository->index->jw = NULL;
> +		jw_release(&jw);
> +	}
>  
>  	if (ps_matched) {
>  		int bad;

I see this 'ps_matched' condition at the end, which is related to
the '--error-unmatch' option. I added "--error-unmatch foo" to my
command and got the appropriate error message:

  error: pathspec 'foo' did not match any file(s) known to git
  Did you forget to 'git add'?

This was sent to stderr while the JSON was in stdout, so this should
be appropriate to allow both options. Just pointing it out to make
sure this is intended.

> +void jw_object_stat_data(struct json_writer *jw, const char *name,
> +			 const struct stat_data *sd)
> +{
> +	jw_object_inline_begin_object(jw, name);
> +	jw_object_intmax(jw, "st_ctime.sec", sd->sd_ctime.sec);
> +	jw_object_intmax(jw, "st_ctime.nsec", sd->sd_ctime.nsec);
> +	jw_object_intmax(jw, "st_mtime.sec", sd->sd_mtime.sec);
> +	jw_object_intmax(jw, "st_mtime.nsec", sd->sd_mtime.nsec);
> +	jw_object_intmax(jw, "st_dev", sd->sd_dev);
> +	jw_object_intmax(jw, "st_ino", sd->sd_ino);
> +	jw_object_intmax(jw, "st_uid", sd->sd_uid);
> +	jw_object_intmax(jw, "st_gid", sd->sd_gid);
> +	jw_object_intmax(jw, "st_size", sd->sd_size);
> +	jw_end(jw);
> +}

If these are all part of the same object, are the "st_" prefixes
necessary for every member?

> +	/*
> +	 * again redundant info, just so you don't have to decode
> +	 * flags values manually
> +	 */
> +	if (ce->ce_flags & CE_VALID)
> +		jw_object_true(jw, "assume-unchanged");
> +	if (ce->ce_flags & CE_INTENT_TO_ADD)
> +		jw_object_true(jw, "intent-to-add");
> +	if (ce->ce_flags & CE_SKIP_WORKTREE)
> +		jw_object_true(jw, "skip-worktree");
> +	if (ce_stage(ce))
> +		jw_object_intmax(jw, "stage", ce_stage(ce));

I'm really glad these flags are getting expanded! Much easier to
understand what's going on this way.

> +	if (istate->jw) {
> +		jw_object_begin(istate->jw, jw_pretty);
> +		jw_object_intmax(istate->jw, "version", istate->version);
> +		jw_object_string(istate->jw, "oid", oid_to_hex(&istate->oid));
> +		jw_object_intmax(istate->jw, "st_mtime.sec", istate->timestamp.sec);
> +		jw_object_intmax(istate->jw, "st_mtime.nsec", istate->timestamp.nsec);

Here, the "st_" prefixes are not on every member, but would it
be confusing if they were not there? Also, including a "." in
a member name may be troublesome for JSON, as that typically
means we are accessing a member of an object. Perhaps use _sec
and _nsec here and in the earlier stat_data block.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/8] resolve-undo.c: dump "REUC" extension as json
  2019-06-19  9:58 ` [PATCH 4/8] resolve-undo.c: dump "REUC" " Nguyễn Thái Ngọc Duy
@ 2019-06-19 13:16   ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2019-06-19 13:16 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git

On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:

> +	if (jw) {
> +		jw_object_inline_begin_object(jw, "resolve-undo");
> +		jw_object_intmax(jw, "ext-size", size);
> +		jw_object_inline_begin_array(jw, "entries");
> +	}

While reading this block, I noticed the use of hyphens in the
member names could cause some problems when translating into
object models in some languages. While this is valid JSON, I
found helpful recommendations in the Google JSON Style Guide [1]
that could apply here and elsewhere. Specifically, this
recommendation:

  "Property names must be camel-cased, ascii strings."

Treating JSON members as camel-cased variable names would
promote consumption by third-party tools.

Thanks,
-Stolee

[1] https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Property_Name_Format#Property_Name_Format

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] read-cache.c: dump "IEOT" extension as json
  2019-06-19  9:58 ` [PATCH 6/8] read-cache.c: dump "IEOT" " Nguyễn Thái Ngọc Duy
@ 2019-06-19 13:18   ` Derrick Stolee
  2019-06-19 13:24     ` Duy Nguyen
  0 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2019-06-19 13:18 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git

On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:> @@ -2266,7 +2271,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  	 * to multi-thread the reading of the cache entries.
>  	 */
>  	if (extension_offset && nr_threads > 1)
> -		ieot = read_ieot_extension(mmap, mmap_size, extension_offset);
> +		ieot = read_ieot_extension(mmap, mmap_size, extension_offset, NULL);

I tried applying this series on top of v2.22.0 and ran into an issue
on this patch, and the message seemed to imply the problem was at this
block. I couldn't figure out what was wrong, but maybe the series is
based on a different commit?

That said, I applied the previous patches, compiled, and manually
tested those features. Seemed to be working as advertised.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] read-cache.c: dump "IEOT" extension as json
  2019-06-19 13:18   ` Derrick Stolee
@ 2019-06-19 13:24     ` Duy Nguyen
  2019-06-19 14:26       ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Duy Nguyen @ 2019-06-19 13:24 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Wed, Jun 19, 2019 at 8:18 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:> @@ -2266,7 +2271,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
> >        * to multi-thread the reading of the cache entries.
> >        */
> >       if (extension_offset && nr_threads > 1)
> > -             ieot = read_ieot_extension(mmap, mmap_size, extension_offset);
> > +             ieot = read_ieot_extension(mmap, mmap_size, extension_offset, NULL);
>
> I tried applying this series on top of v2.22.0 and ran into an issue
> on this patch, and the message seemed to imply the problem was at this
> block. I couldn't figure out what was wrong, but maybe the series is
> based on a different commit?

it's on 'master', a6a95cd1b4 (The second batch, 2019-06-17). There are
a couple patches since v2.22.0 that touch read-cache.c, but they don't
touch these lines explictly...
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] read-cache.c: dump "IEOT" extension as json
  2019-06-19 13:24     ` Duy Nguyen
@ 2019-06-19 14:26       ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2019-06-19 14:26 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On 6/19/2019 9:24 AM, Duy Nguyen wrote:
> On Wed, Jun 19, 2019 at 8:18 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:> @@ -2266,7 +2271,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>>>        * to multi-thread the reading of the cache entries.
>>>        */
>>>       if (extension_offset && nr_threads > 1)
>>> -             ieot = read_ieot_extension(mmap, mmap_size, extension_offset);
>>> +             ieot = read_ieot_extension(mmap, mmap_size, extension_offset, NULL);
>>
>> I tried applying this series on top of v2.22.0 and ran into an issue
>> on this patch, and the message seemed to imply the problem was at this
>> block. I couldn't figure out what was wrong, but maybe the series is
>> based on a different commit?
> 
> it's on 'master', a6a95cd1b4 (The second batch, 2019-06-17). There are
> a couple patches since v2.22.0 that touch read-cache.c, but they don't
> touch these lines explictly...

Thanks, I should have tried from master myself. Starting there worked.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (8 preceding siblings ...)
  2019-06-19 11:58 ` [PATCH 0/8] Add 'ls-files --json' to dump the index in json Derrick Stolee
@ 2019-06-19 19:17 ` Jeff King
  2019-06-21  8:37   ` Duy Nguyen
  2019-06-21 13:16   ` Johannes Schindelin
  2019-06-20  4:00 ` Junio C Hamano
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 36+ messages in thread
From: Jeff King @ 2019-06-19 19:17 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Wed, Jun 19, 2019 at 04:58:50PM +0700, Nguyễn Thái Ngọc Duy wrote:

> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.

I don't have any particular use for this, but I am all in favor of tools
that make it easier to access and analyze information kept in our
on-disk formats (some of this is available via --debug, I think, but
AFAIK most of the extension bits are not).

And I'd rather see something like JSON than inventing yet another ad-hoc
output format.

I think your warning in the manpage that this is for debugging is fine,
as it does not put us on the hook for maintaining the feature nor its
format forever. We might want to call it "--debug=json" or something,
though, in case we do want real stable json support later (though of
course we would be free to steal the option then, since we're making no
promises).

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (9 preceding siblings ...)
  2019-06-19 19:17 ` Jeff King
@ 2019-06-20  4:00 ` Junio C Hamano
  2019-06-20 19:12 ` Jeff Hostetler
  2019-06-21 23:30 ` brian m. carlson
  12 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2019-06-20  4:00 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.

;-)  

JSON is not particularly my cup-of-tea but it is better than many
other things exactly for one reason (everybody and their dog have
heard of it), and certainly is much superiour than inventing our own
ad-hoc format.  

Thanks for working on this (I do not expect I would see an immediate
need for this myself, though).



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (10 preceding siblings ...)
  2019-06-20  4:00 ` Junio C Hamano
@ 2019-06-20 19:12 ` Jeff Hostetler
  2019-06-21 23:30 ` brian m. carlson
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Hostetler @ 2019-06-20 19:12 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git



On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.
> 
> Nguyễn Thái Ngọc Duy (8):
>    ls-files: add --json to dump the index
>    split-index.c: dump "link" extension as json
>    fsmonitor.c: dump "FSMN" extension as json
>    resolve-undo.c: dump "REUC" extension as json
>    read-cache.c: dump "EOIE" extension as json
>    read-cache.c: dump "IEOT" extension as json
>    cache-tree.c: dump "TREE" extension as json
>    dir.c: dump "UNTR" extension as json
> 
>   Documentation/git-ls-files.txt |   5 ++
>   builtin/ls-files.c             |  30 +++++--
>   cache-tree.c                   |  41 ++++++++--
>   cache-tree.h                   |   5 +-
>   cache.h                        |   2 +
>   dir.c                          |  56 ++++++++++++-
>   dir.h                          |   4 +-
>   fsmonitor.c                    |   9 +++
>   json-writer.c                  |  30 +++++++
>   json-writer.h                  |  29 +++++++
>   read-cache.c                   | 139 ++++++++++++++++++++++++++++++---
>   resolve-undo.c                 |  36 ++++++++-
>   resolve-undo.h                 |   4 +-
>   split-index.c                  |  13 ++-
>   14 files changed, 376 insertions(+), 27 deletions(-)
> 

Thanks for working on this!  I've been wanting to do something
like this for a while.  I too am tired of digging thru hex dumps
or "od" output whenever I have an odd problem to investigate.
This will certainly help.

Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19 19:17 ` Jeff King
@ 2019-06-21  8:37   ` Duy Nguyen
  2019-06-21 20:48     ` Jeff King
  2019-06-21 13:16   ` Johannes Schindelin
  1 sibling, 1 reply; 36+ messages in thread
From: Duy Nguyen @ 2019-06-21  8:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Thu, Jun 20, 2019 at 2:17 AM Jeff King <peff@peff.net> wrote:
> I think your warning in the manpage that this is for debugging is fine,
> as it does not put us on the hook for maintaining the feature nor its
> format forever. We might want to call it "--debug=json" or something,

Hmm.. does it mean we make --debug PARSE_OPT_OPTARG? In other words,
"--debug" still means "text", --debug=json is obvious, but "--debug
json" means "text" debug with  pathspec "json". Which is really
horrible in my opinion.

Or is it ok to just make the argument mandatory? That would be a
behavior change, but I suppose --debug is a thing only we use and
could still be a safe thing to do...

> though, in case we do want real stable json support later (though of
> course we would be free to steal the option then, since we're making no
> promises).
>
> -Peff



-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] ls-files: add --json to dump the index
  2019-06-19 13:03   ` Derrick Stolee
@ 2019-06-21 13:04     ` Johannes Schindelin
  2019-06-24 12:50     ` Duy Nguyen
  1 sibling, 0 replies; 36+ messages in thread
From: Johannes Schindelin @ 2019-06-21 13:04 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Nguyễn Thái Ngọc Duy, git

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

Hi,

On Wed, 19 Jun 2019, Derrick Stolee wrote:

> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
>
> > +--json::
> > +	Dump the entire index content in JSON format. This is for
> > +	debugging purposes and the JSON structure may change from time
> > +	to time.
> > +
>
> "...purposes and the JSON structure may change from time to time" may be
> better as "...purposes. The JSON structure is subject to change."

It would probably make even more sense to mark this as an experimental
feature for now (i.e. prefix the description with "(EXPERIMENTAL) ", so
that users will have a harder time to miss that vague statement at the
end.

Once the feature stabilized enough, it would probably make sense to start
versioning the JSON format (`--json[=<version>]`, defaulting to the
latest).

That would make this a pretty useful feature not only for debugging, I
would imagine, but really would set a precedent for a better "API" for
3rd-party applications to use than the current one.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19 19:17 ` Jeff King
  2019-06-21  8:37   ` Duy Nguyen
@ 2019-06-21 13:16   ` Johannes Schindelin
  2019-06-21 13:49     ` Duy Nguyen
  2019-06-21 20:51     ` Jeff King
  1 sibling, 2 replies; 36+ messages in thread
From: Johannes Schindelin @ 2019-06-21 13:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

Hi Peff,

On Wed, 19 Jun 2019, Jeff King wrote:

> On Wed, Jun 19, 2019 at 04:58:50PM +0700, Nguyễn Thái Ngọc Duy wrote:
>
> > This is probably just my itch. Every time I have to do something with
> > the index, I need to add a little bit code here, a little bit there to
> > get a better "view" of the index.
> >
> > This solves it for me. It allows me to see pretty much everything in the
> > index (except really low detail stuff like pathname compression). It's
> > readable by human, but also easy to parse if you need to do statistics
> > and stuff. You could even do a "diff" between two indexes.
> >
> > I'm not really sure if anybody else finds this useful. Because if not,
> > I guess there's not much point trying to merge it to git.git just for a
> > single user. Maintaining off tree is still a pain for me, but I think
> > I can manage it.
>
> I don't have any particular use for this, but I am all in favor of tools
> that make it easier to access and analyze information kept in our
> on-disk formats (some of this is available via --debug, I think, but
> AFAIK most of the extension bits are not).
>
> And I'd rather see something like JSON than inventing yet another ad-hoc
> output format.
>
> I think your warning in the manpage that this is for debugging is fine,
> as it does not put us on the hook for maintaining the feature nor its
> format forever. We might want to call it "--debug=json" or something,
> though, in case we do want real stable json support later (though of
> course we would be free to steal the option then, since we're making no
> promises).

Traditionally, we have not catered well to 3rd-party applications in Git,
and this JSON format would provide a way out of that problem.

So I would like *not* to lock the door on letting this feature stabilize
organically.

I'd be much more in favor of `--json[=<version>]`, with an initial version
of 0 to indicate that it really is unstable for now.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 13:16   ` Johannes Schindelin
@ 2019-06-21 13:49     ` Duy Nguyen
  2019-06-21 15:10       ` Junio C Hamano
  2019-06-24  9:33       ` Johannes Schindelin
  2019-06-21 20:51     ` Jeff King
  1 sibling, 2 replies; 36+ messages in thread
From: Duy Nguyen @ 2019-06-21 13:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Git Mailing List

On Fri, Jun 21, 2019 at 8:16 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

> > I think your warning in the manpage that this is for debugging is fine,
> > as it does not put us on the hook for maintaining the feature nor its
> > format forever. We might want to call it "--debug=json" or something,
> > though, in case we do want real stable json support later (though of
> > course we would be free to steal the option then, since we're making no
> > promises).
>
> Traditionally, we have not catered well to 3rd-party applications in Git,
> and this JSON format would provide a way out of that problem.
>
> So I would like *not* to lock the door on letting this feature stabilize
> organically.
>
> I'd be much more in favor of `--json[=<version>]`, with an initial version
> of 0 to indicate that it really is unstable for now.

Considering the amount of code to output these, supporting multiple
formats would be a nightmare. I may be ok with versioning the output
so the tool know what format they need to deal with, but I'd rather
support just one version. For third parties wanting to dig deep, I
think libgit2 would be a much better fit.
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 13:49     ` Duy Nguyen
@ 2019-06-21 15:10       ` Junio C Hamano
  2019-06-21 20:52         ` Jeff King
  2019-06-24  9:33       ` Johannes Schindelin
  1 sibling, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2019-06-21 15:10 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Johannes Schindelin, Jeff King, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> Considering the amount of code to output these, supporting multiple
> formats would be a nightmare. I may be ok with versioning the output
> so the tool know what format they need to deal with, but I'd rather
> support just one version. For third parties wanting to dig deep, I
> think libgit2 would be a much better fit.

Yeah, I think starting with --debug=json (or --debug-json) until we
see some stability in the output and got comfortable to the idea of
"version X" to mean what we output at that point, and then renaming
it to "--json" with "version: 1" in the output stream so that third
party can use it (and interpret it according to version 1 rules) is
the way to go.  Third-party tools are welcome to read --debug-json
output as an early-adoption practice waiting for the real thing, but
we do not want to be locked into a schema too eary before we are
ready.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21  8:37   ` Duy Nguyen
@ 2019-06-21 20:48     ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2019-06-21 20:48 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Fri, Jun 21, 2019 at 03:37:45PM +0700, Duy Nguyen wrote:

> On Thu, Jun 20, 2019 at 2:17 AM Jeff King <peff@peff.net> wrote:
> > I think your warning in the manpage that this is for debugging is fine,
> > as it does not put us on the hook for maintaining the feature nor its
> > format forever. We might want to call it "--debug=json" or something,
> 
> Hmm.. does it mean we make --debug PARSE_OPT_OPTARG? In other words,
> "--debug" still means "text", --debug=json is obvious, but "--debug
> json" means "text" debug with  pathspec "json". Which is really
> horrible in my opinion.

Yeah, that's the nature of OPTARG. ;)

> Or is it ok to just make the argument mandatory? That would be a
> behavior change, but I suppose --debug is a thing only we use and
> could still be a safe thing to do...

Yeah, I think that would be perfectly fine (or you could just call it
--debug-json as a new option, if you didn't want to make people do
--debug=text for the existing behavior).

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 13:16   ` Johannes Schindelin
  2019-06-21 13:49     ` Duy Nguyen
@ 2019-06-21 20:51     ` Jeff King
  2019-06-24  9:52       ` Johannes Schindelin
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff King @ 2019-06-21 20:51 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Nguyễn Thái Ngọc Duy, git

On Fri, Jun 21, 2019 at 03:16:52PM +0200, Johannes Schindelin wrote:

> > I think your warning in the manpage that this is for debugging is fine,
> > as it does not put us on the hook for maintaining the feature nor its
> > format forever. We might want to call it "--debug=json" or something,
> > though, in case we do want real stable json support later (though of
> > course we would be free to steal the option then, since we're making no
> > promises).
> 
> Traditionally, we have not catered well to 3rd-party applications in Git,
> and this JSON format would provide a way out of that problem.
> 
> So I would like *not* to lock the door on letting this feature stabilize
> organically.

I'd like it to stabilize organically, too, but my thinking was that we'd
wait a while and then promote it to a stable name eventually.

> I'd be much more in favor of `--json[=<version>]`, with an initial version
> of 0 to indicate that it really is unstable for now.

That's OK with me, too, if you think "0" indicates that sufficiently
(we've used "v0" in a lot of other places to refer to stable protocols,
like the git:// one). Maybe it's OK with some documentation making it
clear.

I'm not sure whether we want to be locked into supporting this v0
forever or not (though maybe it would not be such a burden).

I think JSON-based output also has the potential to need fewer bumps.
It's syntactically stable, so it's really just about our schema. And
it's easy to say "newer versions of Git may produce new keys; you can
ignore them", as long as we do not change the meaning of existing keys.
That might be an easier promise to make.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 15:10       ` Junio C Hamano
@ 2019-06-21 20:52         ` Jeff King
  2019-06-24  9:35           ` Johannes Schindelin
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2019-06-21 20:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Johannes Schindelin, Git Mailing List

On Fri, Jun 21, 2019 at 08:10:58AM -0700, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > Considering the amount of code to output these, supporting multiple
> > formats would be a nightmare. I may be ok with versioning the output
> > so the tool know what format they need to deal with, but I'd rather
> > support just one version. For third parties wanting to dig deep, I
> > think libgit2 would be a much better fit.
> 
> Yeah, I think starting with --debug=json (or --debug-json) until we
> see some stability in the output and got comfortable to the idea of
> "version X" to mean what we output at that point, and then renaming
> it to "--json" with "version: 1" in the output stream so that third
> party can use it (and interpret it according to version 1 rules) is
> the way to go.  Third-party tools are welcome to read --debug-json
> output as an early-adoption practice waiting for the real thing, but
> we do not want to be locked into a schema too eary before we are
> ready.

I should have read the whole thread before responding. I made a similar
comment to Dscho, so I guess that is now two of us. :)

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
                   ` (11 preceding siblings ...)
  2019-06-20 19:12 ` Jeff Hostetler
@ 2019-06-21 23:30 ` brian m. carlson
  2019-06-22  2:54   ` Duy Nguyen
  12 siblings, 1 reply; 36+ messages in thread
From: brian m. carlson @ 2019-06-21 23:30 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On 2019-06-19 at 09:58:50, Nguyễn Thái Ngọc Duy wrote:
> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.

I'm generally in favor of this, but we need to document what this does
when it encounters paths that are not valid UTF-8. (Ideally, the answer
is, "die()", but I suspect the answer will be "silently produce invalid
output".) Those can of course occur on Unix systems, but also on
Windows, where unpaired surrogates can occur.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 23:30 ` brian m. carlson
@ 2019-06-22  2:54   ` Duy Nguyen
  0 siblings, 0 replies; 36+ messages in thread
From: Duy Nguyen @ 2019-06-22  2:54 UTC (permalink / raw)
  To: brian m. carlson, Nguyễn Thái Ngọc Duy,
	Git Mailing List

On Sat, Jun 22, 2019 at 6:31 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2019-06-19 at 09:58:50, Nguyễn Thái Ngọc Duy wrote:
> > This is probably just my itch. Every time I have to do something with
> > the index, I need to add a little bit code here, a little bit there to
> > get a better "view" of the index.
> >
> > This solves it for me. It allows me to see pretty much everything in the
> > index (except really low detail stuff like pathname compression). It's
> > readable by human, but also easy to parse if you need to do statistics
> > and stuff. You could even do a "diff" between two indexes.
> >
> > I'm not really sure if anybody else finds this useful. Because if not,
> > I guess there's not much point trying to merge it to git.git just for a
> > single user. Maintaining off tree is still a pain for me, but I think
> > I can manage it.
>
> I'm generally in favor of this, but we need to document what this does
> when it encounters paths that are not valid UTF-8. (Ideally, the answer
> is, "die()", but I suspect the answer will be "silently produce invalid
> output".)

I think you're right, we don't assume anything when writing json
strings, so it's not going to be utf-8 (or die) if the path is also
not valid utf-8. The good thing is all this could be done in just one
place, append_quoted_string(), if someone needs too. I'll just go
document the fact that we may produce invalid UTF-8.
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 13:49     ` Duy Nguyen
  2019-06-21 15:10       ` Junio C Hamano
@ 2019-06-24  9:33       ` Johannes Schindelin
  2019-06-24  9:35         ` Duy Nguyen
  1 sibling, 1 reply; 36+ messages in thread
From: Johannes Schindelin @ 2019-06-24  9:33 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Git Mailing List

Hi Duy,

On Fri, 21 Jun 2019, Duy Nguyen wrote:

> On Fri, Jun 21, 2019 at 8:16 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > > I think your warning in the manpage that this is for debugging is fine,
> > > as it does not put us on the hook for maintaining the feature nor its
> > > format forever. We might want to call it "--debug=json" or something,
> > > though, in case we do want real stable json support later (though of
> > > course we would be free to steal the option then, since we're making no
> > > promises).
> >
> > Traditionally, we have not catered well to 3rd-party applications in Git,
> > and this JSON format would provide a way out of that problem.
> >
> > So I would like *not* to lock the door on letting this feature stabilize
> > organically.
> >
> > I'd be much more in favor of `--json[=<version>]`, with an initial version
> > of 0 to indicate that it really is unstable for now.
>
> Considering the amount of code to output these, supporting multiple
> formats would be a nightmare. I may be ok with versioning the output
> so the tool know what format they need to deal with, but I'd rather
> support just one version.

Once the format stabilized, I don't think it would be a huge burden to
support multiple formats, if we ever had to update.

It would, however, be a huge burden on third-party applications. In
effect, we could be lazy, but we would put a lot more burden on others
than we saved ourselves, so that would be a bit... selfish.

> For third parties wanting to dig deep, I think libgit2 would be a much
> better fit.

If we (i.e. the core Git contributors) were contributing new features/bug
fixes to libgit2, that would be a good recommendation.

But we don't. We essentially ignore libgit2 (and all of their learnings)
all the time.

Even worse, for years, even decades, we recommended the command-line as
"the API". If you want to reverse that recommendation, I think it merits a
bigger discussion than a flimsical comment buried in a thread about an
experimental feature.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 20:52         ` Jeff King
@ 2019-06-24  9:35           ` Johannes Schindelin
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Schindelin @ 2019-06-24  9:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Duy Nguyen, Git Mailing List

Hi Peff & Junio,

On Fri, 21 Jun 2019, Jeff King wrote:

> On Fri, Jun 21, 2019 at 08:10:58AM -0700, Junio C Hamano wrote:
>
> > Duy Nguyen <pclouds@gmail.com> writes:
> >
> > > Considering the amount of code to output these, supporting multiple
> > > formats would be a nightmare. I may be ok with versioning the output
> > > so the tool know what format they need to deal with, but I'd rather
> > > support just one version. For third parties wanting to dig deep, I
> > > think libgit2 would be a much better fit.
> >
> > Yeah, I think starting with --debug=json (or --debug-json) until we
> > see some stability in the output and got comfortable to the idea of
> > "version X" to mean what we output at that point, and then renaming
> > it to "--json" with "version: 1" in the output stream so that third
> > party can use it (and interpret it according to version 1 rules) is
> > the way to go.  Third-party tools are welcome to read --debug-json
> > output as an early-adoption practice waiting for the real thing, but
> > we do not want to be locked into a schema too eary before we are
> > ready.
>
> I should have read the whole thread before responding. I made a similar
> comment to Dscho, so I guess that is now two of us. :)

It is a bit of a chicken-and-egg problem. You want the format to
stabilize. But you also don't want to commit to one final format. And you
choose as option name a deliberately discouraging one, deterring the
(third-party application) developers who could most help you evolve the
format to a sensible and useful stable version.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-24  9:33       ` Johannes Schindelin
@ 2019-06-24  9:35         ` Duy Nguyen
  0 siblings, 0 replies; 36+ messages in thread
From: Duy Nguyen @ 2019-06-24  9:35 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Git Mailing List

On Mon, Jun 24, 2019 at 4:32 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Duy,
>
> On Fri, 21 Jun 2019, Duy Nguyen wrote:
>
> > On Fri, Jun 21, 2019 at 8:16 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> >
> > > > I think your warning in the manpage that this is for debugging is fine,
> > > > as it does not put us on the hook for maintaining the feature nor its
> > > > format forever. We might want to call it "--debug=json" or something,
> > > > though, in case we do want real stable json support later (though of
> > > > course we would be free to steal the option then, since we're making no
> > > > promises).
> > >
> > > Traditionally, we have not catered well to 3rd-party applications in Git,
> > > and this JSON format would provide a way out of that problem.
> > >
> > > So I would like *not* to lock the door on letting this feature stabilize
> > > organically.
> > >
> > > I'd be much more in favor of `--json[=<version>]`, with an initial version
> > > of 0 to indicate that it really is unstable for now.
> >
> > Considering the amount of code to output these, supporting multiple
> > formats would be a nightmare. I may be ok with versioning the output
> > so the tool know what format they need to deal with, but I'd rather
> > support just one version.
>
> Once the format stabilized, I don't think it would be a huge burden to
> support multiple formats, if we ever had to update.
>
> It would, however, be a huge burden on third-party applications. In
> effect, we could be lazy, but we would put a lot more burden on others
> than we saved ourselves, so that would be a bit... selfish.

JSON is the land of high level languages. They can adapt to new format
quite easily, compared to restructuring C to support multiple
different formats. Yes I'm quite OK with being selfish in this case.
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/8] Add 'ls-files --json' to dump the index in json
  2019-06-21 20:51     ` Jeff King
@ 2019-06-24  9:52       ` Johannes Schindelin
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Schindelin @ 2019-06-24  9:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git

Hi Peff,

On Fri, 21 Jun 2019, Jeff King wrote:

> On Fri, Jun 21, 2019 at 03:16:52PM +0200, Johannes Schindelin wrote:
>
> > > I think your warning in the manpage that this is for debugging is fine,
> > > as it does not put us on the hook for maintaining the feature nor its
> > > format forever. We might want to call it "--debug=json" or something,
> > > though, in case we do want real stable json support later (though of
> > > course we would be free to steal the option then, since we're making no
> > > promises).
> >
> > Traditionally, we have not catered well to 3rd-party applications in Git,
> > and this JSON format would provide a way out of that problem.
> >
> > So I would like *not* to lock the door on letting this feature stabilize
> > organically.
>
> I'd like it to stabilize organically, too, but my thinking was that we'd
> wait a while and then promote it to a stable name eventually.

Git's command-line options have stabilized organically.

Example: to include untracked files in `git stash`, use `-u` or
`--include-untracked`, to include them in `git add`, use `-A` or `--all`,
to include them in `git grep`, use `--untracked` (no short option), to
include them in `git ls-files`, use `-o` or `--others`. The command `git
commit` does not even have an option to include untracked files.

You know of more examples of organically grown designs in Git, I am sure.
Given those examples, I am not sure that I want the JSON format to
stabilize organically.

> > I'd be much more in favor of `--json[=<version>]`, with an initial
> > version of 0 to indicate that it really is unstable for now.
>
> That's OK with me, too, if you think "0" indicates that sufficiently
> (we've used "v0" in a lot of other places to refer to stable protocols,
> like the git:// one). Maybe it's OK with some documentation making it
> clear.

I did think that the `0` would be clear, but you are probably right.

> I'm not sure whether we want to be locked into supporting this v0
> forever or not (though maybe it would not be such a burden).
>
> I think JSON-based output also has the potential to need fewer bumps.
> It's syntactically stable, so it's really just about our schema. And
> it's easy to say "newer versions of Git may produce new keys; you can
> ignore them", as long as we do not change the meaning of existing keys.
> That might be an easier promise to make.

Right.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] ls-files: add --json to dump the index
  2019-06-19 13:03   ` Derrick Stolee
  2019-06-21 13:04     ` Johannes Schindelin
@ 2019-06-24 12:50     ` Duy Nguyen
  1 sibling, 0 replies; 36+ messages in thread
From: Duy Nguyen @ 2019-06-24 12:50 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Wed, Jun 19, 2019 at 8:03 PM Derrick Stolee <stolee@gmail.com> wrote:
> > -     show_files(the_repository, &dir);
> > -
> > -     if (show_resolve_undo)
> > -             show_ru_info(the_repository->index);
> > +     if (!show_json) {
> > +             show_files(the_repository, &dir);
> > +
> > +             if (show_resolve_undo)
> > +                     show_ru_info(the_repository->index);
> > +     } else {
> > +             struct json_writer jw = JSON_WRITER_INIT;
> > +
> > +             discard_index(the_repository->index);
> > +             the_repository->index->jw = &jw;
> > +             if (repo_read_index(the_repository) < 0)
> > +                     die("index file corrupt");
> > +             puts(jw.json.buf);
> > +             the_repository->index->jw = NULL;
> > +             jw_release(&jw);
> > +     }
> >
> >       if (ps_matched) {
> >               int bad;
>
> I see this 'ps_matched' condition at the end, which is related to
> the '--error-unmatch' option. I added "--error-unmatch foo" to my
> command and got the appropriate error message:
>
>   error: pathspec 'foo' did not match any file(s) known to git
>   Did you forget to 'git add'?
>
> This was sent to stderr while the JSON was in stdout, so this should
> be appropriate to allow both options. Just pointing it out to make
> sure this is intended.

--error-unmatch only makes sense when you specify pathspec (like
"foo") but that does not work well with --json at all because we don't
do filtering (how do we even filter in extensions?). I'll just make
sure that "ls-files --json <pathspec>" is rejected. That'll cover
--error-unmatch.
-- 
Duy

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-06-24 12:50 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-19  9:58 [PATCH 0/8] Add 'ls-files --json' to dump the index in json Nguyễn Thái Ngọc Duy
2019-06-19  9:58 ` [PATCH 1/8] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
2019-06-19 10:30   ` Ævar Arnfjörð Bjarmason
2019-06-19 13:03   ` Derrick Stolee
2019-06-21 13:04     ` Johannes Schindelin
2019-06-24 12:50     ` Duy Nguyen
2019-06-19  9:58 ` [PATCH 2/8] split-index.c: dump "link" extension as json Nguyễn Thái Ngọc Duy
2019-06-19  9:58 ` [PATCH 3/8] fsmonitor.c: dump "FSMN" " Nguyễn Thái Ngọc Duy
2019-06-19  9:58 ` [PATCH 4/8] resolve-undo.c: dump "REUC" " Nguyễn Thái Ngọc Duy
2019-06-19 13:16   ` Derrick Stolee
2019-06-19  9:58 ` [PATCH 5/8] read-cache.c: dump "EOIE" " Nguyễn Thái Ngọc Duy
2019-06-19  9:58 ` [PATCH 6/8] read-cache.c: dump "IEOT" " Nguyễn Thái Ngọc Duy
2019-06-19 13:18   ` Derrick Stolee
2019-06-19 13:24     ` Duy Nguyen
2019-06-19 14:26       ` Derrick Stolee
2019-06-19  9:58 ` [PATCH 7/8] cache-tree.c: dump "TREE" " Nguyễn Thái Ngọc Duy
2019-06-19  9:58 ` [PATCH 8/8] dir.c: dump "UNTR" " Nguyễn Thái Ngọc Duy
2019-06-19 11:58 ` [PATCH 0/8] Add 'ls-files --json' to dump the index in json Derrick Stolee
2019-06-19 12:42   ` Duy Nguyen
2019-06-19 12:48     ` Derrick Stolee
2019-06-19 19:17 ` Jeff King
2019-06-21  8:37   ` Duy Nguyen
2019-06-21 20:48     ` Jeff King
2019-06-21 13:16   ` Johannes Schindelin
2019-06-21 13:49     ` Duy Nguyen
2019-06-21 15:10       ` Junio C Hamano
2019-06-21 20:52         ` Jeff King
2019-06-24  9:35           ` Johannes Schindelin
2019-06-24  9:33       ` Johannes Schindelin
2019-06-24  9:35         ` Duy Nguyen
2019-06-21 20:51     ` Jeff King
2019-06-24  9:52       ` Johannes Schindelin
2019-06-20  4:00 ` Junio C Hamano
2019-06-20 19:12 ` Jeff Hostetler
2019-06-21 23:30 ` brian m. carlson
2019-06-22  2:54   ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).