git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>, git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v2 01/10] ls-files: add --json to dump the index
Date: Mon, 24 Jun 2019 15:15:54 -0400	[thread overview]
Message-ID: <755a4cfe-fd6b-044b-dca2-05eebfa518b1@jeffhostetler.com> (raw)
In-Reply-To: <20190624130226.17293-2-pclouds@gmail.com>



On 6/24/2019 9:02 AM, Nguyễn Thái Ngọc Duy wrote:
> So far we don't have a command to basically dump the index file out,
> with all its glory details. Checking some info, for example, stat
> time, usually involves either writing new code or firing up "xxd" and
> decoding values by yourself.
> 
> This --json is supposed to help that. It dumps the index in a human
> readable format but also easy to be processed with tools. And it will
> print almost enough info to reconstruct the index later.
> 
> In this patch we only dump the main part, not extensions. But at the
> end of the series, the entire index is dumped. The end result could be
> very verbose even on a small repository such as git.git.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

...

> diff --git a/json-writer.c b/json-writer.c
> index aadb9dbddc..0608726512 100644
> --- a/json-writer.c
> +++ b/json-writer.c
> @@ -202,6 +202,28 @@ void jw_object_null(struct json_writer *jw, const char *key)
>   	strbuf_addstr(&jw->json, "null");
>   }
>   
> +void jw_object_filemode(struct json_writer *jw, const char *key, mode_t mode)
> +{
> +	object_common(jw, key);
> +	strbuf_addf(&jw->json, "\"%06o\"", mode);
> +}
> +
> +void jw_object_stat_data(struct json_writer *jw, const char *name,
> +			 const struct stat_data *sd)

Should this be in json_writer.c or in read-cache.c ?
Currently, json_writer.c is concerned with formatting
JSON on basic/scalar types.  Do we want to start
extending it to handle arbitrary structures?  Or would
it be better for the code that defines/manipulates the
structure to define a "stat_data_dump_json()" function.

I'm torn on the jw_object_filemode() function, JSON format
limits us to decimal integers and there are places where
I'd like to have hex, or in this case octal values.

I'm thinking it'd be better to have a helper function in
read-cache.c that formats a local strbuf and calls
js_object_string(&jw, key, buf);


> +{
> +	jw_object_inline_begin_object(jw, name);
> +	jw_object_intmax(jw, "ctime_sec", sd->sd_ctime.sec);
> +	jw_object_intmax(jw, "ctime_nsec", sd->sd_ctime.nsec);
> +	jw_object_intmax(jw, "mtime_sec", sd->sd_mtime.sec);
> +	jw_object_intmax(jw, "mtime_nsec", sd->sd_mtime.nsec);

It'd be nice if we could also have a formatted date
for the mtime and ctime in addition to the integer
values.  (I'm not sure whether you'd always want them
or make it a verbose option.)

> +	jw_object_intmax(jw, "device", sd->sd_dev);
> +	jw_object_intmax(jw, "inode", sd->sd_ino);
> +	jw_object_intmax(jw, "uid", sd->sd_uid);
> +	jw_object_intmax(jw, "gid", sd->sd_gid);
> +	jw_object_intmax(jw, "size", sd->sd_size);
> +	jw_end(jw);
> +}
> +
>   static void increase_indent(struct strbuf *sb,
>   			    const struct json_writer *jw,
>   			    int indent)
> diff --git a/json-writer.h b/json-writer.h
> index 83906b09c1..c48c4cbf33 100644
> --- a/json-writer.h
> +++ b/json-writer.h
> @@ -42,8 +42,11 @@
>    * of the given strings.
>    */
>   
> +#include "git-compat-util.h"
>   #include "strbuf.h"
>   
> +struct stat_data;
> +
>   struct json_writer
>   {
>   	/*
> @@ -81,6 +84,9 @@ void jw_object_true(struct json_writer *jw, const char *key);
>   void jw_object_false(struct json_writer *jw, const char *key);
>   void jw_object_bool(struct json_writer *jw, const char *key, int value);
>   void jw_object_null(struct json_writer *jw, const char *key);
> +void jw_object_filemode(struct json_writer *jw, const char *key, mode_t value);
> +void jw_object_stat_data(struct json_writer *jw, const char *key,
> +			 const struct stat_data *sd);
>   void jw_object_sub_jw(struct json_writer *jw, const char *key,
>   		      const struct json_writer *value);
>   
> @@ -104,4 +110,21 @@ void jw_array_inline_begin_array(struct json_writer *jw);
>   int jw_is_terminated(const struct json_writer *jw);
>   void jw_end(struct json_writer *jw);
>   
> +/*
> + * These _gently versions accept NULL json_writer to reduce too much
> + * branching at the call site.
> + */
> +static inline void jw_object_inline_begin_array_gently(struct json_writer *jw,
> +						       const char *name)
> +{
> +	if (jw)
> +		jw_object_inline_begin_array(jw, name);
> +}
> +
> +static inline void jw_end_gently(struct json_writer *jw)
> +{
> +	if (jw)
> +		jw_end(jw);
> +}
> +
>   #endif /* JSON_WRITER_H */
> diff --git a/read-cache.c b/read-cache.c
> index 4dd22f4f6e..db5147d088 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -25,6 +25,7 @@
>   #include "fsmonitor.h"
>   #include "thread-utils.h"
>   #include "progress.h"
> +#include "json-writer.h"
>   
>   /* Mask for the name length in ce_flags in the on-disk index */
>   
> @@ -1952,6 +1953,49 @@ static void *load_index_extensions(void *_data)
>   	return NULL;
>   }
>   
> +static void dump_cache_entry(struct index_state *istate,
> +			     int index,
> +			     unsigned long offset,
> +			     const struct cache_entry *ce)
> +{
> +	struct json_writer *jw = istate->jw;
> +
> +	jw_array_inline_begin_object(jw);
> +
> +	/*
> +	 * this is technically redundant, but it's for easier
> +	 * navigation when there hundreds of entries
> +	 */
> +	jw_object_intmax(jw, "id", index);
> +
> +	jw_object_string(jw, "name", ce->name);
> +
> +	jw_object_filemode(jw, "mode", ce->ce_mode);
> +
> +	jw_object_intmax(jw, "flags", ce->ce_flags);

It would be nice to have the flags as a hex-formatted string
in addition to (or instead of) the decimal integer value.

> +	/*
> +	 * again redundant info, just so you don't have to decode
> +	 * flags values manually
> +	 */
> +	if (ce->ce_flags & CE_EXTENDED)
> +		jw_object_true(jw, "extended_flags");
> +	if (ce->ce_flags & CE_VALID)
> +		jw_object_true(jw, "assume_unchanged");
> +	if (ce->ce_flags & CE_INTENT_TO_ADD)
> +		jw_object_true(jw, "intent_to_add");
> +	if (ce->ce_flags & CE_SKIP_WORKTREE)
> +		jw_object_true(jw, "skip_worktree");
> +	if (ce_stage(ce))
> +		jw_object_intmax(jw, "stage", ce_stage(ce));
> +
> +	jw_object_string(jw, "oid", oid_to_hex(&ce->oid));
> +
> +	jw_object_stat_data(jw, "stat", &ce->ce_stat_data);
> +	jw_object_intmax(jw, "file_offset", offset);
> +
> +	jw_end(jw);
> +}
> +
>   /*
>    * A helper function that will load the specified range of cache entries
>    * from the memory mapped file and add them to the given index.
> @@ -1972,6 +2016,9 @@ static unsigned long load_cache_entry_block(struct index_state *istate,
>   		ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, &consumed, previous_ce);
>   		set_index_entry(istate, i, ce);
>   
> +		if (istate->jw)
> +			dump_cache_entry(istate, i, src_offset, ce);
> +
>   		src_offset += consumed;
>   		previous_ce = ce;
>   	}
> @@ -1983,6 +2030,8 @@ static unsigned long load_all_cache_entries(struct index_state *istate,
>   {
>   	unsigned long consumed;
>   
> +	jw_object_inline_begin_array_gently(istate->jw, "entries");
> +
>   	if (istate->version == 4) {
>   		mem_pool_init(&istate->ce_mem_pool,
>   				estimate_cache_size_from_compressed(istate->cache_nr));
> @@ -1993,6 +2042,8 @@ static unsigned long load_all_cache_entries(struct index_state *istate,
>   
>   	consumed = load_cache_entry_block(istate, istate->ce_mem_pool,
>   					0, istate->cache_nr, mmap, src_offset, NULL);
> +
> +	jw_end_gently(istate->jw);
>   	return consumed;
>   }
>   
> @@ -2120,6 +2171,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>   	size_t extension_offset = 0;
>   	int nr_threads, cpus;
>   	struct index_entry_offset_table *ieot = NULL;
> +	int jw_pretty = 1;
>   
>   	if (istate->initialized)
>   		return istate->cache_nr;
> @@ -2154,6 +2206,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>   	istate->cache_nr = ntohl(hdr->hdr_entries);
>   	istate->cache_alloc = alloc_nr(istate->cache_nr);
>   	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
> +	istate->timestamp.sec = st.st_mtime;
> +	istate->timestamp.nsec = ST_MTIME_NSEC(st);
>   	istate->initialized = 1;
>   
>   	p.istate = istate;
> @@ -2176,6 +2230,20 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>   	if (!HAVE_THREADS)
>   		nr_threads = 1;
>   
> +	if (istate->jw) {
> +		jw_object_begin(istate->jw, jw_pretty);
> +		jw_object_intmax(istate->jw, "version", istate->version);
> +		jw_object_string(istate->jw, "oid", oid_to_hex(&istate->oid));
> +		jw_object_intmax(istate->jw, "mtime_sec", istate->timestamp.sec);
> +		jw_object_intmax(istate->jw, "mtime_nsec", istate->timestamp.nsec);

again, it would be nice to also have a formated version of the mtime.

> +
> +		/*
> +		 * Threading may mess up json writing. This is for
> +		 * debugging only, so performance is not a concern.
> +		 */
> +		nr_threads = 1;

yes. we should turn off threading when dumping to json.

> +	}
> +
>   	if (nr_threads > 1) {
>   		extension_offset = read_eoie_extension(mmap, mmap_size);
>   		if (extension_offset) {
> @@ -2204,8 +2272,6 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>   		src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset);
>   	}
>   
> -	istate->timestamp.sec = st.st_mtime;
> -	istate->timestamp.nsec = ST_MTIME_NSEC(st);
>   
>   	/* if we created a thread, join it otherwise load the extensions on the primary thread */
>   	if (extension_offset) {
> @@ -2216,6 +2282,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>   		p.src_offset = src_offset;
>   		load_index_extensions(&p);
>   	}
> +	jw_end_gently(istate->jw);
> +
>   	munmap((void *)mmap, mmap_size);
>   
>   	/*
> 
...

Thanks
Jeff

  reply	other threads:[~2019-06-24 19:15 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24 13:02 [PATCH v2 00/10] Add 'ls-files --debug-json' to dump the index in json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 01/10] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
2019-06-24 19:15   ` Jeff Hostetler [this message]
2019-06-24 20:04     ` Junio C Hamano
2019-06-25  9:21     ` Johannes Schindelin
2019-06-25  9:52     ` Duy Nguyen
2019-06-25 15:37       ` Jeff Hostetler
2019-06-25  9:05   ` Thomas Gummerer
2019-06-25  9:44   ` Johannes Schindelin
2019-06-25 11:31     ` Johannes Schindelin
2019-06-25 13:57       ` Johannes Schindelin
2019-06-25 22:28     ` Junio C Hamano
2019-06-26 19:51   ` Junio C Hamano
2019-06-24 13:02 ` [PATCH v2 02/10] read-cache.c: dump common extension info in json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 03/10] cache-tree.c: dump "TREE" extension as json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 04/10] dir.c: dump "UNTR" " Nguyễn Thái Ngọc Duy
2019-06-24 19:32   ` Jeff Hostetler
2019-06-24 13:02 ` [PATCH v2 05/10] split-index.c: dump "link" " Nguyễn Thái Ngọc Duy
2019-06-24 20:06   ` Jeff Hostetler
2019-06-25 10:29     ` Duy Nguyen
2019-06-25 12:40       ` Derrick Stolee
2019-06-27 10:48         ` Duy Nguyen
2019-06-27 13:24           ` Jeff Hostetler
2019-06-27 13:42             ` Derrick Stolee
2019-06-27 13:47               ` Duy Nguyen
2019-07-03  9:08   ` SZEDER Gábor
2019-07-04 20:01   ` SZEDER Gábor
2019-07-04 23:54     ` Duy Nguyen
2019-07-08 17:58       ` Junio C Hamano
2019-06-24 13:02 ` [PATCH v2 06/10] fsmonitor.c: dump "FSMN" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 07/10] resolve-undo.c: dump "REUC" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 08/10] read-cache.c: dump "EOIE" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 09/10] read-cache.c: dump "IEOT" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 10/10] t3008: use the new SINGLE_CPU prereq Nguyễn Thái Ngọc Duy
2019-06-24 18:00 ` [PATCH v2 00/10] Add 'ls-files --debug-json' to dump the index in json Johannes Schindelin
2019-06-24 18:39   ` Jeff Hostetler
2019-06-25  9:05   ` Duy Nguyen
2019-06-25  9:38     ` Thomas Gummerer
2019-06-25 11:27     ` Johannes Schindelin
2019-06-25 12:06       ` Duy Nguyen
2019-06-25 14:10         ` Johannes Schindelin
2019-06-25 17:08           ` Ramsay Jones
2019-06-26 15:05             ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=755a4cfe-fd6b-044b-dca2-05eebfa518b1@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).