git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Abhradeep Chakraborty via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Kaartic Sivaram <kaartic.sivaraam@gmail.com>,
	Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Subject: Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
Date: Mon, 20 Jun 2022 13:21:50 -0400	[thread overview]
Message-ID: <YrCsricF+2rQXiBk@nand.local> (raw)
In-Reply-To: <2e22ca5069af617fe23072d78efb08b26d6130be.1655728395.git.gitgitgadget@gmail.com>

On Mon, Jun 20, 2022 at 12:33:09PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> When reading bitmap file, git loads each and every bitmap one by one
> even if all the bitmaps are not required. A "bitmap lookup table"
> extension to the bitmap format can reduce the overhead of loading
> bitmaps which stores a list of bitmapped commit oids, along with their
> offset and xor offset. This way git can load only the neccesary bitmaps
> without loading the previous bitmaps.

Well put. It might help to have a concrete example of where we expect
this to help and not help. I suspect that some of this will show up in
your work updating the perf suite to use this new table, but I imagine
that we'll find something like:

    In cases where the result can be read or computed without
    significant additional traversal (e.g., all commits of interest
    already have bitmaps computed), we can save some time loading and
    parsing a majority of the bitmap file that we will never read.

    But in cases where the bitmaps are out-of-date, or there is
    significant traversal required to go from the reference tips to
    what's contained in the .bitmap file, this table provides minimal
    benefit (or something).

Of course, you should verify that that is actually true before we insert
it into the commit message as such ;-). But that sort of information may
help readers understand what the purpose of this change is towards the
beinning of the series.

> Add some information for the new "bitmap lookup table" extension in the
> bitmap-format documentation.
>
> Co-Authored-by: Taylor Blau <ttaylorr@github.com>
> Mentored-by: Taylor Blau <ttaylorr@github.com>

Here and elsewhere: I typically use my <me@ttaylorr.com> address when
contributing to Git. So any trailers that mention my email or commits
that you send on my behalf should use that address, too.

> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  Documentation/technical/bitmap-format.txt | 31 +++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>
> diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> index 04b3ec21785..34e98787b78 100644
> --- a/Documentation/technical/bitmap-format.txt
> +++ b/Documentation/technical/bitmap-format.txt
> @@ -67,6 +67,14 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  			pack/MIDX. The format and meaning of the name-hash is
>  			described below.
>
> +			** {empty}
> +			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::

It the space between "(0xf)" and the first ":" intentional? Similarly,
should there be two or three colons at the end (either "::" or ":::")?

> +			If present, the end of the bitmap file contains a table
> +			containing a list of `N` object ids, a list of pairs of
> +			offset and xor offset of respective objects, and 4-byte
> +			integer denoting the flags (currently none). The format
> +			and meaning of the table is described below.
> +

I remember we had a brief off-list discussion about whether we should
store the full object IDs in the offset table, or whether we could store
their pack- or index-relative ordering. Is there a reason to prefer one
or the other?

I don't think we need to explain the choice fully in the documentation
in this patch, but it may be worth thinking about separately
nonetheless. We can store either order and convert it to an object ID in
constant time.

To figure out which is best, I would recommend trying a few different
choices here and seeing how they do or don't impact your performance
testing.

>  		4-byte entry count (network byte order)
>
>  			The total count of entries (bitmapped commits) in this bitmap index.
> @@ -205,3 +213,26 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
>  If implementations want to choose a different hashing scheme, they are
>  free to do so, but MUST allocate a new header flag (because comparing
>  hashes made under two different schemes would be pointless).
> +
> +Commit lookup table
> +-------------------
> +
> +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the end of the `.bitmap`
> +contains a lookup table specifying the positions of commits which have a
> +bitmap.
> +
> +For a `.bitmap` containing `nr_entries` reachability bitmaps, the format
> +is as follows:
> +
> +	- `nr_entries` object names.
> +
> +	- `nr_entries` pairs of 4-byte integers, each in network order.
> +	  The first holds the offset from which that commit's bitmap can
> +	  be read. The second number holds the position of the commit
> +	  whose bitmap the current bitmap is xor'd with in lexicographic
> +	  order, or 0xffffffff if the current commit is not xor'd with
> +	  anything.

A couple of small thoughts here. I wonder if we'd get better locality if
we made each record look something like:

    (object_id, offset, xor_pos)

Where object_id is either 20- or 4-bytes long (depending if we store the
full object ID, or some 4-byte identifier that allows us to discover
it), offset is 8 bytes long, and xor_pos is 4-bytes (since in practice
we don't support packs or MIDXs which have more than 2^32-1 objects).

In the event that this table doesn't fit into a single cache line, I
think we'll get better performance out of reading it by not forcing the
cache to evict itself whenever we need to refer back to the object_id.

> +	- One 4-byte network byte order integer specifying
> +	  table-specific flags. None exist currently, so this is always
> +	  "0".

I mentioned in my reply to Stolee earlier, but I think that we should
either (a) try to remember what this is for and document it, or (b)
remove it.

Thanks,
Taylor

  parent reply	other threads:[~2022-06-20 17:21 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-20 16:56   ` Derrick Stolee
2022-06-20 17:09     ` Taylor Blau
2022-06-21  8:31       ` Abhradeep Chakraborty
2022-06-22 16:26         ` Taylor Blau
2022-06-21  8:23     ` Abhradeep Chakraborty
2022-06-20 17:21   ` Taylor Blau [this message]
2022-06-21  9:22     ` Abhradeep Chakraborty
2022-06-22 16:29       ` Taylor Blau
2022-06-22 16:45         ` Abhradeep Chakraborty
2022-06-20 20:21   ` Derrick Stolee
2022-06-21 10:08     ` Abhradeep Chakraborty
2022-06-22 16:30       ` Taylor Blau
2022-06-20 12:33 ` [PATCH 2/6] pack-bitmap: prepare to read " Abhradeep Chakraborty via GitGitGadget
2022-06-20 20:49   ` Derrick Stolee
2022-06-21 10:28     ` Abhradeep Chakraborty
2022-06-20 22:06   ` Taylor Blau
2022-06-21 11:52     ` Abhradeep Chakraborty
2022-06-22 16:49       ` Taylor Blau
2022-06-22 17:18         ` Abhradeep Chakraborty
2022-06-22 21:34           ` Taylor Blau
2022-06-20 12:33 ` [PATCH 3/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-06-20 22:16   ` Taylor Blau
2022-06-21 12:50     ` Abhradeep Chakraborty
2022-06-22 16:51       ` Taylor Blau
2022-06-20 12:33 ` [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable Taylor Blau via GitGitGadget
2022-06-20 22:18   ` Taylor Blau
2022-06-20 12:33 ` [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table Abhradeep Chakraborty via GitGitGadget
2022-06-22 16:54   ` Taylor Blau
2022-06-20 12:33 ` [PATCH 6/6] bitmap-lookup-table: add performance tests Abhradeep Chakraborty via GitGitGadget
2022-06-22 17:14   ` Taylor Blau
2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-06-26 13:10   ` [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:18     ` Derrick Stolee
2022-06-27 15:48       ` Taylor Blau
2022-06-27 16:51       ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:35     ` Derrick Stolee
2022-06-27 16:12       ` Taylor Blau
2022-06-27 17:10       ` Abhradeep Chakraborty
2022-06-27 16:05     ` Taylor Blau
2022-06-27 18:29       ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:43     ` Derrick Stolee
2022-06-27 17:42       ` Abhradeep Chakraborty
2022-06-27 17:49         ` Taylor Blau
2022-06-27 17:47     ` Taylor Blau
2022-06-27 18:39       ` Abhradeep Chakraborty
2022-06-29 20:11         ` Taylor Blau
2022-06-26 13:10   ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-27 15:12     ` Derrick Stolee
2022-06-27 18:06       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
2022-06-27 18:32         ` Derrick Stolee
2022-06-27 21:49       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Taylor Blau
2022-06-28  8:59         ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
2022-06-29 20:22           ` Taylor Blau
2022-06-30  6:58             ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty
2022-06-27 21:38     ` Taylor Blau
2022-06-28 19:25       ` Abhradeep Chakraborty
2022-06-29 20:37         ` Taylor Blau
2022-06-29 20:41           ` Taylor Blau
2022-06-30  8:35           ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-06-27 21:53     ` Taylor Blau
2022-06-28  7:58       ` Abhradeep Chakraborty
2022-06-29 20:40         ` Taylor Blau
2022-06-26 13:10   ` [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing Abhradeep Chakraborty via GitGitGadget
2022-06-27 21:50     ` Taylor Blau
2022-06-28  8:01       ` Abhradeep Chakraborty
2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-07-04  8:46     ` [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-08 16:38       ` Philip Oakley
2022-07-09  7:53         ` Abhradeep Chakraborty
2022-07-10 15:01           ` Philip Oakley
2022-07-14 23:15             ` Taylor Blau
2022-07-15 10:36               ` Philip Oakley
2022-07-15 18:48             ` Abhradeep Chakraborty
2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-14 23:26       ` Taylor Blau
2022-07-15  2:22       ` Taylor Blau
2022-07-15 15:58         ` Abhradeep Chakraborty
2022-07-15 22:15           ` Taylor Blau
2022-07-16 11:50             ` Abhradeep Chakraborty
2022-07-26  0:34               ` Taylor Blau
2022-07-18  8:59       ` Martin Ågren
2022-07-04  8:46     ` [PATCH v3 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-04  8:46     ` [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-15  2:46       ` Taylor Blau
2022-07-15 16:38         ` Abhradeep Chakraborty
2022-07-15 22:20           ` Taylor Blau
2022-07-18  9:06             ` Martin Ågren
2022-07-18 19:25               ` Abhradeep Chakraborty
2022-07-18 23:26                 ` Martin Ågren
2022-07-26  0:45               ` Taylor Blau
2022-07-04  8:46     ` [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-07-15  2:53       ` Taylor Blau
2022-07-15 18:23         ` Abhradeep Chakraborty
2022-07-04  8:46     ` [PATCH v3 6/6] p5310-pack-bitmaps.sh: remove pack.writeReverseIndex Abhradeep Chakraborty via GitGitGadget
2022-07-04 16:35     ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty
2022-07-06 19:21     ` Junio C Hamano
2022-07-07  8:48       ` Abhradeep Chakraborty
2022-07-07 18:09         ` Kaartic Sivaraam
2022-07-07 18:42           ` Abhradeep Chakraborty
2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38         ` [PATCH v5 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38         ` [PATCH v5 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-26  0:52           ` Taylor Blau
2022-07-26 18:22             ` Abhradeep Chakraborty
2022-07-20 18:38         ` [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-28 19:22           ` Johannes Schindelin
2022-08-02 12:40             ` Abhradeep Chakraborty
2022-08-02 15:35               ` Johannes Schindelin
2022-08-02 17:44                 ` Abhradeep Chakraborty
2022-08-08 13:06                   ` Johannes Schindelin
2022-08-08 13:58                     ` Abhradeep Chakraborty
2022-08-09  9:03                       ` Johannes Schindelin
2022-08-09 12:03                         ` Abhradeep Chakraborty
2022-08-09 12:07                           ` Abhradeep Chakraborty
2022-08-10  9:09                           ` Johannes Schindelin
2022-08-10  9:20                             ` Johannes Schindelin
2022-08-10 10:04                               ` Abhradeep Chakraborty
2022-08-10 17:51                                 ` Derrick Stolee
2022-08-12 18:51                                   ` Abhradeep Chakraborty
2022-08-12 19:22                                     ` Derrick Stolee
2022-08-13 10:59                                       ` Abhradeep Chakraborty
2022-08-16 21:57                                         ` Taylor Blau
2022-08-17 10:02                                           ` Abhradeep Chakraborty
2022-08-17 20:38                                             ` Taylor Blau
2022-08-19 21:49                                               ` Taylor Blau
2022-08-13 11:05                               ` Abhradeep Chakraborty
2022-08-16 18:47                             ` Taylor Blau
2022-07-20 18:38         ` [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-26  1:13           ` Taylor Blau
2022-07-26 18:56             ` Abhradeep Chakraborty
2022-07-26 19:36             ` Eric Sunshine
2022-07-20 18:38         ` [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
2022-07-26  1:18           ` Taylor Blau
2022-07-26  7:15             ` Ævar Arnfjörð Bjarmason
2022-07-26 13:32               ` Derrick Stolee
2022-07-26 13:54                 ` Ævar Arnfjörð Bjarmason
2022-07-26 18:17                   ` Abhradeep Chakraborty
2022-07-20 18:38         ` [PATCH v5 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 2/6] bitmap: move `get commit positions` code to `bitmap_writer_finish` Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 3/6] pack-bitmap-write.c: write lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 4/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 5/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-08-19 21:21           ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Junio C Hamano
2022-08-22 14:42             ` Johannes Schindelin
2022-08-22 14:48               ` Taylor Blau
2022-08-25 22:16           ` Taylor Blau
2022-08-26 16:02             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrCsricF+2rQXiBk@nand.local \
    --to=me@ttaylorr.com \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=kaartic.sivaraam@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).