git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, sandals@crustytoothpaste.net,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 0/6] Create commit-graph file format v2
Date: Wed, 01 May 2019 22:25:59 +0200	[thread overview]
Message-ID: <87lfzprkfc.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.112.v3.git.gitgitgadget@gmail.com>


On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:

> The commit-graph file format has some shortcomings that were discussed
> on-list:
>
>  1. It doesn't use the 4-byte format ID from the_hash_algo.
>
>  2. There is no way to change the reachability index from generation numbers
>     to corrected commit date [1].
>
>  3. The unused byte in the format could be used to signal the file is
>     incremental, but current clients ignore the value even if it is
>     non-zero.
>
> This series adds a new version (2) to the commit-graph file. The fifth byte
> already specified the file format, so existing clients will gracefully
> respond to files with a different version number. The only real change now
> is that the header takes 12 bytes instead of 8, due to using the 4-byte
> format ID for the hash algorithm.
>
> The new bytes reserved for the reachability index version and incremental
> file formats are now expected to be equal to the defaults. When we update
> these values to be flexible in the future, if a client understands
> commit-graph v2 but not those new values, then it will fail gracefully.
>
> NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
> were significant and subtle.
>
> Updates in V3: Thanks for all the feedback so far!
>
>  * Moved the version information into an unsigned char parameter, instead of
>    a flag.
>
>  * We no longer default to the v2 file format, as that will break users who
>    downgrade. This required some changes to the test script.
>
>  * Removed the "Future work" section from the commit-graph design document
>    in a new patch.
>
>  * I did not change the file name for v2 file formats, as Ævar suggested.
>    I'd like the discussion to continue on this topic.

I won't repeat my outstanding v2 feedback about v1 & v2
incompatibilities, except to say that I'd in principle be fine with
having a v2 format the way this series is adding it. I.e. saying "here
it is, it's never written by default, we'll figure out these compat
issues later".

My only objection/nit on that point would be that the current
docs/commit messages should make some mention of the really bad
interactions between v1 and v2 on different git versions.

However, having written this again I really don't understand why we need
a v2 of this format at all.

The current format is:

    <CGPH signature>
    <CGPH version = 1>
    <hash version (0..255) where 1 == SHA-1>
    <num chunks (0..255)>
    <reserved byte ignored>
    [chunk offsets for our $num_chunks]
    [arbitrary chunk data for our $num_chunks]

And you want to change it to:

    <CGPH signature>
    <CGPH version = 2>
    <num chunks (0..255)>
    <reachability index version, hard error on values != 1 (should have seen this in my [1])>
    <reserved byte hard error on values != 0 [1]>
    <hash version 32 bit. So 0x73686131 = "sha1" instead of "1">
    [chunk offsets for our $num_chunks]
    [arbitrary chunk data for our $num_chunks]

Where "chunks" in the v1 format has always been a non-exhaustive list of
things *where we ignore anything we don't know about*.

So given our really bad compatibility issues with any non-v1 format I
suggested "let's use a different filename". But on closer look I retract
that.

How about we instead just don't change the header? I.e.:

 * Let's just live with "1" as the marker for SHA-1.

   Yeah it would be cute to use 0x73686131 instead like "struct
   git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
   2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
   be running into the 255 limit of hash algorithms Git will support any
   time soon.

 * Don't add the reachability index version *to the header* or change
   the reserved byte to be an error (see [1] again).

Instead we just add these things to new "chunks" as appropriate. As this
patch of mine shows we can easily do that, and it doesn't error out on
any existing version of git:
https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95

I now can't imagine a situation where we'd ever need to change the
format. We have 32 bits of chunk ids to play with, and can have 255 of
these chunks at a time, and unknown chunks are ignored by existing
versions and future version.

We can even have more than 255 if it comes to that by having a special
"extension" chunk, or even use the existing reserved byte for that and
pull the nasty trick of putting another set after the existing file
checksum, but I digress.

If we ever find that we e.g. don't want to write SHA-1 data anymore but
just want SHA-256 we just write a tiny amount of dummy data. Older git
versions will shrug at what looks like a really incomplete commit graph
data, but newer versions will know the real data is in some other chunk
they know about.

Ditto this "gen numbers or adjusted timestamps?" plan in
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/
We can have a chunk of adjusted timestamps into the generation number
chunk, and even start adding chunks of other side-data, e.g. the path
bloom filters...

This E-Mail needs to stop at some point, but as a brief aside I don't
see how this die("commit-graph hash algorithm does not match current
algorithm") plan makes sense either.

The hash-function-transition.txt plan describes how we'll have an index
of SHA-1<->SHA-256 object names. Why would it be an error to read a
SHA-1 commit-graph under SHA-256? Won't we just say "oh this graph lists
the SHA-1s" and then use that lookup table to resolve them as SHA-256s?

And as discussed once we go for extending things with chunks we can do a
lot better. We'd just keep the SHA-1 data segment, and perhaps (for
"just commits" cache locality) have another chunk of SHA-256 mappings to
those SHA-1s in the commit-graph file.

1. See feedback on the v2 patch in
   https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/

  parent reply	other threads:[~2019-05-01 20:26 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 3/6] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-01-24  9:31   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-01-23 23:56   ` Jonathan Tan
2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
2019-01-24 14:34     ` Derrick Stolee
2019-03-21  9:21   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget
2019-01-23 23:59   ` Jonathan Tan
2019-01-24 23:05 ` [PATCH 0/6] Create commit-graph file format v2 Junio C Hamano
2019-01-24 23:39 ` Junio C Hamano
2019-01-25 13:54   ` Derrick Stolee
2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-04-25  5:21     ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-04-25  5:29     ` Junio C Hamano
2019-04-25 11:09       ` Derrick Stolee
2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
2019-04-26  2:20       ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 4/5] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 5/5] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
2019-04-26  2:28     ` Junio C Hamano
2019-04-26  8:33       ` Ævar Arnfjörð Bjarmason
2019-04-26 12:06         ` Derrick Stolee
2019-04-26 13:55           ` Ævar Arnfjörð Bjarmason
2019-04-27 12:57     ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-01 14:46       ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11     ` [PATCH v3 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 3/6] commit-graph: create new version parameter Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-05-01 19:12       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:56         ` Derrick Stolee
2019-05-01 13:11     ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-01 14:58       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:59         ` Derrick Stolee
2019-05-01 20:25     ` Ævar Arnfjörð Bjarmason [this message]
2019-05-02 13:26       ` [PATCH v3 0/6] Create commit-graph file format v2 Derrick Stolee
2019-05-02 18:02         ` Ævar Arnfjörð Bjarmason
2019-05-03 12:47           ` Derrick Stolee
2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
2019-05-06  8:27               ` Christian Couder
2019-05-06 13:47                 ` Derrick Stolee
2019-05-03 14:16             ` SZEDER Gábor
2019-05-03 15:11               ` Derrick Stolee
2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-05-13  2:56         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-13  3:13         ` Junio C Hamano
2019-05-13 11:04           ` Derrick Stolee
2019-05-13 11:22             ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-13  3:44         ` Junio C Hamano
2019-05-13 11:07           ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-05-13  5:05         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
2019-05-13  5:09         ` Junio C Hamano
2019-05-09 17:58       ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Josh Steadmon
2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-06-29 17:23           ` SZEDER Gábor
2019-07-01 12:19             ` Derrick Stolee
2019-06-12 13:29         ` [PATCH v5 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfzprkfc.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).