git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: "Jeff Hostetler" <git@jeffhostetler.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v2 05/10] split-index.c: dump "link" extension as json
Date: Thu, 27 Jun 2019 17:48:59 +0700	[thread overview]
Message-ID: <CACsJy8CwWvKNbYvDqWc-zCwEPc_rz-P4y-SvXV-9jL8_XCFjZQ@mail.gmail.com> (raw)
In-Reply-To: <98afb501-ef57-9b64-7ffb-f13cea6fd58a@gmail.com>

On Tue, Jun 25, 2019 at 7:40 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/25/2019 6:29 AM, Duy Nguyen wrote:
> > On Tue, Jun 25, 2019 at 3:06 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
> >> I'm curious how big these EWAHs will be in practice and
> >> how useful an array of integers will be (especially as the
> >> pretty format will be one integer per line).  Perhaps it
> >> would helpful to have an extended example in one of the
> >> tests.
> >
> > It's one integer per updated entry. So if you have a giant index and
> > updated every single one of them, the EWAH bitmap contains that many
> > integers.
> >
> > If it was easy to just merge these bitmaps back to the entry (e.g. in
> > this example, add "replaced": true to entry zero) I would have done
> > it. But we dump as we stream and it's already too late to do it.
> >
> >> Would it be better to have the caller of ewah_each_bit()
> >> build a hex or bit string in a strbuf and then write it
> >> as a single string?
> >
> > I don't think the current EWAH representation is easy to read in the
> > first place. You'll probably have to run through some script to update
> > the main entries part and will have a much better view, but that's
> > pretty quick. If it's for scripts, then it's probably best to keep as
> > an array of integers, not a string. Less post processing.
>
> I don't think the intent is to dump the EWAH directly, but instead to
> dump a string of the uncompressed bitmap. Something like:
>
>         "delete_bitmap" : "01101101101"
>
> instead of
>
>         "delete_bitmap" : [ 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1 ]

I get this part. But the numbers in the array were the position of the
set bits. It's not showing just the actual bit map.

The same bitmap would be currently displayed as

 "delete_bitmap": [ 1, 2, 4, 5, 7, 8, 9, 11 ]

And that maps back to the entry[1], entry[2], entry[4]... in the index
being deleted from the base index. So displaying as a real bit map
actually adds more work for both the reader and the tool because you
have to calculate the position either way. And it gets harder if the
bit you're intereted in is on the far right.

> > Another reason for not merging to one string (might not be a very good
> > argument though) is to help diff between two indexes.
> > One-number-per-line works well with "git diff --no-index" while one
> > long string is a bit harder. I did this kind of comparison when I made
> > changes in read-cache.c and wanted to check if the new index file is
> > completely broken, or just slighly broken.
>
> You're right that the diff of the json output is an interesting
> use, and the "single string" output is not helpful. What about
> batches of 64-bit strings? For example:
>
>         "delete_bitmap" : [
>                 "0101010101010101010101010101010101010101010101010101010101010101",
>                 "0101010101010101010101010101010101010101010101010101010101010101",
>                 "0101010101010101010101010101010101010101010101010101010101010101",
>                 "01010101010101"
>         ]
>
> This could be a happy medium between the two options, but does require
> some extra work in the formatter.

And the reader/parser too since you have to join that array back in
one string first.
--
Duy

  reply	other threads:[~2019-06-27 10:49 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24 13:02 [PATCH v2 00/10] Add 'ls-files --debug-json' to dump the index in json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 01/10] ls-files: add --json to dump the index Nguyễn Thái Ngọc Duy
2019-06-24 19:15   ` Jeff Hostetler
2019-06-24 20:04     ` Junio C Hamano
2019-06-25  9:21     ` Johannes Schindelin
2019-06-25  9:52     ` Duy Nguyen
2019-06-25 15:37       ` Jeff Hostetler
2019-06-25  9:05   ` Thomas Gummerer
2019-06-25  9:44   ` Johannes Schindelin
2019-06-25 11:31     ` Johannes Schindelin
2019-06-25 13:57       ` Johannes Schindelin
2019-06-25 22:28     ` Junio C Hamano
2019-06-26 19:51   ` Junio C Hamano
2019-06-24 13:02 ` [PATCH v2 02/10] read-cache.c: dump common extension info in json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 03/10] cache-tree.c: dump "TREE" extension as json Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 04/10] dir.c: dump "UNTR" " Nguyễn Thái Ngọc Duy
2019-06-24 19:32   ` Jeff Hostetler
2019-06-24 13:02 ` [PATCH v2 05/10] split-index.c: dump "link" " Nguyễn Thái Ngọc Duy
2019-06-24 20:06   ` Jeff Hostetler
2019-06-25 10:29     ` Duy Nguyen
2019-06-25 12:40       ` Derrick Stolee
2019-06-27 10:48         ` Duy Nguyen [this message]
2019-06-27 13:24           ` Jeff Hostetler
2019-06-27 13:42             ` Derrick Stolee
2019-06-27 13:47               ` Duy Nguyen
2019-07-03  9:08   ` SZEDER Gábor
2019-07-04 20:01   ` SZEDER Gábor
2019-07-04 23:54     ` Duy Nguyen
2019-07-08 17:58       ` Junio C Hamano
2019-06-24 13:02 ` [PATCH v2 06/10] fsmonitor.c: dump "FSMN" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 07/10] resolve-undo.c: dump "REUC" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 08/10] read-cache.c: dump "EOIE" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 09/10] read-cache.c: dump "IEOT" " Nguyễn Thái Ngọc Duy
2019-06-24 13:02 ` [PATCH v2 10/10] t3008: use the new SINGLE_CPU prereq Nguyễn Thái Ngọc Duy
2019-06-24 18:00 ` [PATCH v2 00/10] Add 'ls-files --debug-json' to dump the index in json Johannes Schindelin
2019-06-24 18:39   ` Jeff Hostetler
2019-06-25  9:05   ` Duy Nguyen
2019-06-25  9:38     ` Thomas Gummerer
2019-06-25 11:27     ` Johannes Schindelin
2019-06-25 12:06       ` Duy Nguyen
2019-06-25 14:10         ` Johannes Schindelin
2019-06-25 17:08           ` Ramsay Jones
2019-06-26 15:05             ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8CwWvKNbYvDqWc-zCwEPc_rz-P4y-SvXV-9jL8_XCFjZQ@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).