git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2 00/10] Add 'ls-files --debug-json' to dump the index in json
Date: Tue, 25 Jun 2019 13:27:24 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1906251311280.44@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CACsJy8BsT-GaVvEmqfk5n1jGmkcLG_bRjqcU0M3yefBmNSxmnA@mail.gmail.com>

Hi Duy,

On Tue, 25 Jun 2019, Duy Nguyen wrote:

> On Tue, Jun 25, 2019 at 1:00 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > > - extension location is printed, in case you need to decode the
> > >   extension by yourself (previously only the size is printed)
> > > - all extensions are printed in the same order they appear in the file
> > >   (previously eoie and ieot are printed first because that's how we
> > >   parse)
> > > - resolve undo extension is reorganized a bit to be easier to read
> > > - tests added. Example json files are in t/t3011
> >
> > It might actually make sense to optionally disable showing extensions.
> >
> > You also forgot to mention that you explicitly disable handling
> > `<pathspec>`, which I find a bit odd, personally, as that would probably
> > come in real handy at times,
>
> No. I mentioned the land of high level languages before. Filtering in
> any Python, Ruby, Scheme, JavaScript, Java is a piece of cake and much
> more flexible than pathspec.

I heard that type of argument before. I was working on the initial Windows
port of Git, uh, of course I was working on a big refactoring of a big C++
application backed by a database. A colleague suggested that filtering
could be done much better in C++, on the desktop, than in SQL. And so they
changed the paradigm to "simplify" the SQL query, and instead dropped the
unwanted data after it had hit the RAM of the client machine.

Turns out it was a bad idea. A _really_ bad idea. Because it required
downloading 30MB of data for about several dozens computers in parallel,
at the start of every shift.

This change was reverted in one big hurry, and the colleague was tasked to
learn them some SQL.

Why am I telling you this story? Because you fall into the exact same trap
as my colleague.

In this instance, it may not be so much network bandwidth, but it is still
quite a suboptimal idea to render JSON for possibly tens of thousands of
files, then parse the same JSON on the receiving side, the spend even more
time to drop all but a dozen files.

And this is _even more_ relevant when you want to debug things.

In short: I am quite puzzled why this is even debated here. There is a
reason, a good reason, why `git ls-files` accepts pathspecs. I would not
want to ignore the lessons of history as willfully here.

> Even with shell scripts, jq could do a much better job than pathspec. If
> you filter by pathspec, good luck trying that on extensions.

You keep harping on extensions, but the reality of the matter is that they
are rarely interesting. I would even wager a bet that we will end up
excluding them from the JSON output by default.

Most of the times when I had to decode the index file manually in the
past, it was about the regular file entries.

There was *one* week in which I had to decode the untracked cache a bit,
to the point where I patched the test helper locally to help me with that.

If my experience in debugging these things is any indicator, extensions do
not matter even 10% of the non-extension data.

And that's not even taking into account the third-party software that
could definitely benefit from having this JSON format as query result.

In my work as Git for Windows maintainer, I do hear about the needs of
third-party software developers quite a bit, so I would claim that I know
a bit about what they need, why the NUL-terminated format is not a good
match, and how much a JSON-based API would help.

So while that is not your pet, it will be the most useful part of the
outcome of your work.

> It's the same reason why I will not provide a flexible way to disable
> extensions. I'm not starting a JSON API for Git. I provide an index
> file in JSON format. You do what you want with it. You have a format
> easy enough to import to native data structures of your favorite
> language.

I understand that you don't care.

Your patch series is just too good a start on something truly useful to
pass up on the opportunity.

> > especially when we offer this as a better way for 3rd-party
> > applications to interact with Git (which I think will be the use case
> > for this feature that will be _far_ more common than using it for
> > debugging).
>
> We may have conflicting goals. For me, first priority is the debug
> tool for Git developers. 3rd-party support is a stretch. I could move
> all this back to test-tool, then you can provide a 3rd-party API if
> you want. Or I'll withdraw this series and go back to my original
> plan.

You don't need JSON if you want to debug things. That would be a lot of
love lost, if debugging was your goal.

I guess I'll wait until your patch series hits `next`, and then try to
find some time to work on that feature.

Ciao,
Johannes

  parent reply	other threads:[~2019-06-25 11:27 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24 13:02 [PATCH v2 00/10] Add 'ls-files --debug-json' to dump the index in json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 01/10] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
2019-06-24 19:15   ` Jeff Hostetler
2019-06-24 20:04     ` Junio C Hamano
2019-06-25  9:21     ` Johannes Schindelin
2019-06-25  9:52     ` Duy Nguyen
2019-06-25 15:37       ` Jeff Hostetler
2019-06-25  9:05   ` Thomas Gummerer
2019-06-25  9:44   ` Johannes Schindelin
2019-06-25 11:31     ` Johannes Schindelin
2019-06-25 13:57       ` Johannes Schindelin
2019-06-25 22:28     ` Junio C Hamano
2019-06-26 19:51   ` Junio C Hamano
2019-06-24 13:02 ` [PATCH v2 02/10] read-cache.c: dump common extension info in json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 03/10] cache-tree.c: dump "TREE" extension as json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 04/10] dir.c: dump "UNTR" " Nguyễn Thái Ngọc Duy
2019-06-24 19:32   ` Jeff Hostetler
2019-06-24 13:02 ` [PATCH v2 05/10] split-index.c: dump "link" " Nguyễn Thái Ngọc Duy
2019-06-24 20:06   ` Jeff Hostetler
2019-06-25 10:29     ` Duy Nguyen
2019-06-25 12:40       ` Derrick Stolee
2019-06-27 10:48         ` Duy Nguyen
2019-06-27 13:24           ` Jeff Hostetler
2019-06-27 13:42             ` Derrick Stolee
2019-06-27 13:47               ` Duy Nguyen
2019-07-03  9:08   ` SZEDER Gábor
2019-07-04 20:01   ` SZEDER Gábor
2019-07-04 23:54     ` Duy Nguyen
2019-07-08 17:58       ` Junio C Hamano
2019-06-24 13:02 ` [PATCH v2 06/10] fsmonitor.c: dump "FSMN" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 07/10] resolve-undo.c: dump "REUC" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 08/10] read-cache.c: dump "EOIE" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 09/10] read-cache.c: dump "IEOT" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 10/10] t3008: use the new SINGLE_CPU prereq Nguyễn Thái Ngọc Duy
2019-06-24 18:00 ` [PATCH v2 00/10] Add 'ls-files --debug-json' to dump the index in json Johannes Schindelin
2019-06-24 18:39   ` Jeff Hostetler
2019-06-25  9:05   ` Duy Nguyen
2019-06-25  9:38     ` Thomas Gummerer
2019-06-25 11:27     ` Johannes Schindelin [this message]
2019-06-25 12:06       ` Duy Nguyen
2019-06-25 14:10         ` Johannes Schindelin
2019-06-25 17:08           ` Ramsay Jones
2019-06-26 15:05             ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1906251311280.44@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).