git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, avarab@gmail.com,
	git@jeffhostetler.com, jrnieder@google.com, steadmon@google.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v3 01/14] commit-graph: document commit-graph chains
Date: Wed, 5 Jun 2019 14:09:47 -0400	[thread overview]
Message-ID: <62612a31-7649-5410-3732-ff9b6b61da31@gmail.com> (raw)
In-Reply-To: <xmqqmuiwrlpz.fsf@gitster-ct.c.googlers.com>

On 6/5/2019 1:22 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Add a basic description of commit-graph chains. More details about the
>> feature will be added as we add functionality. This introduction gives a
>> high-level overview to the goals of the feature and the basic layout of
>> commit-graph chains.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  Documentation/technical/commit-graph.txt | 59 ++++++++++++++++++++++++
>>  1 file changed, 59 insertions(+)
>>
>> diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
>> index fb53341d5e..1dca3bd8fe 100644
>> --- a/Documentation/technical/commit-graph.txt
>> +++ b/Documentation/technical/commit-graph.txt
>> @@ -127,6 +127,65 @@ Design Details
>>    helpful for these clones, anyway. The commit-graph will not be read or
>>    written when shallow commits are present.
>>  
>> +Commit Graphs Chains
>> +--------------------
>> +
>> +Typically, repos grow with near-constant velocity (commits per day). Over time,
>> +the number of commits added by a fetch operation is much smaller than the
>> +number of commits in the full history. By creating a "chain" of commit-graphs,
>> +we enable fast writes of new commit data without rewriting the entire commit
>> +history -- at least, most of the time.
>> +
>> +## File Layout
>> +
>> +A commit-graph chain uses multiple files, and we use a fixed naming convention
>> +to organize these files. Each commit-graph file has a name
>> +`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
>> +valued hash stored in the footer of that file (which is a hash of the file's
>> +contents before that hash). For a chain of commit-graph files, a plain-text
>> +file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
>> +hashes for the files in order from "lowest" to "highest".
>> +
>> +For example, if the `commit-graph-chain` file contains the lines
>> +
>> +```
>> +	{hash0}
>> +	{hash1}
>> +	{hash2}
>> +```
>> +
>> +then the commit-graph chain looks like the following diagram:
>> +
>> + +-----------------------+
>> + |  graph-{hash2}.graph  |
>> + +-----------------------+
>> +	  |
>> + +-----------------------+
>> + |                       |
>> + |  graph-{hash1}.graph  |
>> + |                       |
>> + +-----------------------+
>> +	  |
>> + +-----------------------+
>> + |                       |
>> + |                       |
>> + |                       |
>> + |  graph-{hash0}.graph  |
>> + |                       |
>> + |                       |
>> + |                       |
>> + +-----------------------+
>> +
>> +Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
>> +commits in `graph-{hash1}.graph`, and X2 be the number of commits in
>> +`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`,
>> +then we interpret this as being the commit in position (X0 + X1 + i), and that
>> +will be used as its "graph position". The commits in `graph-{hash2}.graph` use these
>> +positions to refer to their parents, which may be in `graph-{hash1}.graph` or
>> +`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking
>> +its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
>> +X2).
> 
> One thing that I fail to read from the above is what it means for
> graphs to be inside a single chain.  What is the significance for a
> graph file graph-{hash1}.graph to be between graph-{hash0}.graph and
> graph-{hash2}.graph?   For example, is any of the following true?
> 
>  - For a commit in graph-{hash1}.graph file, if graph->{hash0}.graph
>    or any other graph files lower in the position in the chain were
>    unavailable, information on some ancestor of that commit may not
>    be available.

Not only that, but if graph-{hash0}.graph is unavailable, then the
graph-{hash1}.graph file is _unusable_. For example, we don't track
the number of commits in the base graph, so we could not accurately
find the commit-parent links even within graph-{hash1}.graph.

>  - Even if graph-{hash2}.graph or any other graph files higher in
>    the position in the chain gets lost, information on a commit in
>    graph-{hash1}.graph file or any of its ancestors is not affected.

This is correct. As we "build from the bottom" we can stop if we fail
to find a file, and all the data we already accessed is still valid.

> Another thing I've assumed to be true but cannot read from the above
> description is that the hashes in `commit-graph-chain` file, other
> than the newest one, are merely redundant information, and each
> graph file records the hash of its "previous" graph file (i.e. the
> one that used to be the youngest before it got created).

If the entire chain is available, then we only really need the tip hash.
However, having the entire chain in this file allows the "building from
the bottom" pattern so we can get some value even if the tip was removed
from under us. Since we expect the base file to change infrequently, this
should cover a large number of commits.

I can try to make this pattern more clear in a future revision, assuming
we stick with the pattern. It remains unclear if this strategy as a whole
has been accepted as a good direction.

Thanks,
-Stolee

  reply	other threads:[~2019-06-05 18:09 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-08 15:53 [PATCH 00/17] [RFC] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 01/17] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 02/17] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 03/17] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 04/17] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 05/17] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 06/17] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 07/17] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 08/17] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 09/17] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 10/17] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 11/17] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 12/17] Documentation: describe split commit-graphs Derrick Stolee via GitGitGadget
2019-05-08 17:20   ` SZEDER Gábor
2019-05-08 19:00     ` Derrick Stolee
2019-05-08 20:11       ` Ævar Arnfjörð Bjarmason
2019-05-09  4:49         ` Junio C Hamano
2019-05-09 12:25           ` Derrick Stolee
2019-05-09 13:45         ` Derrick Stolee
2019-05-09 15:48           ` Ævar Arnfjörð Bjarmason
2019-05-09 17:08             ` Derrick Stolee
2019-05-09 21:45               ` Ævar Arnfjörð Bjarmason
2019-05-10 12:44                 ` Derrick Stolee
2019-05-08 15:53 ` [PATCH 13/17] commit-graph: lay groundwork for incremental files Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 14/17] commit-graph: load split commit-graph files Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 15/17] commit-graph: write " Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 16/17] commit-graph: add --split option Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 17/17] fetch: add fetch.writeCommitGraph config setting Derrick Stolee via GitGitGadget
2019-05-09  8:07   ` Ævar Arnfjörð Bjarmason
2019-05-09 14:21     ` Derrick Stolee
2019-05-08 19:27 ` [PATCH 00/17] [RFC] Commit-graph: Write incremental files Ævar Arnfjörð Bjarmason
2019-05-22 19:53 ` [PATCH v2 00/11] " Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 01/11] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 02/11] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 03/11] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 04/11] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 05/11] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 06/11] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 08/11] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-05-27 11:28     ` SZEDER Gábor
2019-05-22 19:53   ` [PATCH v2 07/11] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 09/11] commit-graph: merge " Derrick Stolee via GitGitGadget
2019-05-23  0:43     ` Ævar Arnfjörð Bjarmason
2019-05-23 13:00       ` Derrick Stolee
2019-05-22 19:53   ` [PATCH v2 10/11] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 11/11] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-03 16:03   ` [PATCH v3 00/14] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 01/14] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-05 17:22       ` Junio C Hamano
2019-06-05 18:09         ` Derrick Stolee [this message]
2019-06-06 12:10       ` Philip Oakley
2019-06-06 17:09         ` Derrick Stolee
2019-06-06 21:59           ` Philip Oakley
2019-06-03 16:03     ` [PATCH v3 02/14] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 03/14] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 04/14] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 06/14] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 05/14] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 07/14] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 08/14] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 09/14] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 10/14] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 11/14] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 13/14] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 12/14] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 14/14] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-06 14:15     ` [PATCH v4 00/14] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 01/14] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 02/14] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-06 15:19         ` Philip Oakley
2019-06-06 21:28         ` Junio C Hamano
2019-06-07 12:44           ` Derrick Stolee
2019-06-06 14:15       ` [PATCH v4 03/14] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 04/14] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 22:20         ` Junio C Hamano
2019-06-07 12:53           ` Derrick Stolee
2019-06-06 14:15       ` [PATCH v4 05/14] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-07 18:15         ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 06/14] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-07 18:23         ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 07/14] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 08/14] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-07 21:57         ` Junio C Hamano
2019-06-11 12:51           ` Derrick Stolee
2019-06-11 19:45             ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 09/14] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 10/14] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-06 17:00         ` Philip Oakley
2019-06-06 14:15       ` [PATCH v4 11/14] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 12/14] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-06 18:41         ` Ramsay Jones
2019-06-06 14:15       ` [PATCH v4 13/14] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 14/14] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-06 16:57       ` [PATCH v4 00/14] Commit-graph: Write incremental files Junio C Hamano
2019-06-07 12:37         ` Derrick Stolee
2019-06-07 18:38       ` [PATCH v5 00/16] " Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 01/16] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 02/16] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 03/16] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 04/16] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-10 21:47           ` Junio C Hamano
2019-06-10 23:41             ` Derrick Stolee
2019-06-07 18:38         ` [PATCH v5 05/16] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 07/16] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 06/16] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 08/16] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 09/16] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 10/16] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 11/16] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 13/16] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 12/16] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 14/16] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 15/16] commit-graph: test octopus merges with --split Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 16/16] commit-graph: test --split across alternate without --split Derrick Stolee via GitGitGadget
2019-06-17 15:02         ` [PATCH] commit-graph: normalize commit-graph filenames Derrick Stolee
2019-06-17 15:07           ` Derrick Stolee
2019-06-17 18:07           ` [PATCH v2] " Derrick Stolee
2019-06-18 18:14         ` [PATCH v6 00/18] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 01/18] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 03/18] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 02/18] commit-graph: prepare for commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 04/18] commit-graph: load " Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 05/18] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 06/18] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 07/18] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 08/18] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 09/18] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 10/18] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 11/18] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 12/18] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 13/18] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 14/18] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 15/18] commit-graph: test octopus merges with --split Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 16/18] commit-graph: test --split across alternate without --split Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 18/18] commit-graph: test verify across alternates Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 17/18] commit-graph: normalize commit-graph filenames Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62612a31-7649-5410-3732-ff9b6b61da31@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).