git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: peff@peff.net, avarab@gmail.com, git@jeffhostetler.com,
	jrnieder@google.com, steadmon@google.com,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH 00/17] [RFC] Commit-graph: Write incremental files
Date: Wed, 08 May 2019 08:53:47 -0700 (PDT)	[thread overview]
Message-ID: <pull.184.git.gitgitgadget@gmail.com> (raw)

This patch series is marked as RFC quality because it is missing some key
features and tests, but hopefully starts a concrete discussion of how the
incremental commit-graph writes can work. If this is a good direction, then
it would replace ds/commit-graph-format-v2.

The commit-graph is a valuable performance feature for repos with large
commit histories, but suffers from the same problem as git repack: it
rewrites the entire file every time. This can be slow when there are
millions of commits, especially after we stopped reading from the
commit-graph file during a write in 43d3561 (commit-graph write: don't die
if the existing graph is corrupt).

Instead, create a "stack" of commit-graphs where the existing commit-graph 
file is "level 0" and other levels are in files 
$OBJDIR/info/commit-graphs/commit-graph-N for positive N. Each level is
closed under reachability with its lower levels, and the idea of "graph
position" now considers the concatenation of the commit orders from each
level. See PATCH 12 for more details.

When writing, we don't always want to add a new level to the stack. This
would eventually result in performance degradation, especially when
searching for a commit (before we know its graph position). We decide to
merge levels of the stack when the new commits we will write satisfy two
conditions:

 1. The expected size of the new file is more than half the size of the tip
    of the stack.
 2. The new file contains more than 64,000 commits.

The first condition alone would prevent more than a logarithmic number of
levels. The second condition is a stop-gap to prevent performance issues
when another process starts reading the commit-graph stack as we are merging
a large stack of commit-graph files. The reading process could be in a state
where the new file is not ready, but the levels above the new file were
already deleted. Thus, the commits that were merged down must be parsed from
pack-files.

The performance is necessarily amortized across multiple writes, so I tested
by writing commit-graphs from the (non-rc) tags in the Linux repo. My test
included 72 tags, and wrote everything reachable from the tag using 
--stdin-commits. Here are the overall perf numbers:

git commit-graph write --stdin-commits:         8m 12s
git commit-graph write --stdin-commits --split:    45s

The test using --split included at least six full collapses to the full
commit-graph. I believe the commit-graph stack had at most three levels
during this test.

This series is long because I felt the need to refactor write_commit_graph()
before making such a sweeping change to the format.

 * Patches 1-4: these are small changes which either fix issues or just
   provide clean-up. These are mostly borrowed from
   ds/commit-graph-format-v2.
   
   
 * Patches 5-11: these provide a non-functional refactor of
   write_commit_graph() into several methods using a "struct
   write_commit_graph_context" to share across the methods.
   
   
 * Patches 12-16: Implement the split commit-graph feature.
   
   
 * Patch 17: Demonstrate the value by writing a split commit-graph during 
   git fetch when the new config setting fetch.writeCommitGraph is true.
   
   

TODO: There are several things missing that need to be added before this
series is ready for full review and merging:

 1. The documentation for git commit-graph needs updating for the --split 
    option.
    
    
 2. We likely want config settings for the merge strategy. This is mentioned
    in the design doc, and could be saved for later.
    
    
 3. We want to update the git commit-graph verify subcommand to understand
    the commit-graph stack and optionally only verify the tip of the stack.
    This allows faster (amortized) verification if we are verifying
    immediately after writes and trusting the files at rest.
    
    
 4. It would be helpful to add a new optional chunk that contains the
    trailing hash for the lower level of the commit-graph stack. This chunk
    would only be for the commit-graph-N files, and would provide a simple
    way to check that the stack is valid on read, in case we are still
    worried about other processes reading/writing in the wrong order.
    
    
 5. Currently, --split essentially implies --append since we either (a)
    don't change the existing stack and only add commits, or (b) add all
    existing commits while merging files. However, if you would use --append 
    with --split, the append logic will trigger a merge with the current tip
    (at minimum). Some care should be taken to make this more clear.
    
    

Thanks, -Stolee

[1] 
https://github.com/git/git/commit/43d356180556180b4ef6ac232a14498a5bb2b446
commit-graph write: don't die if the existing graph is corrupt

Derrick Stolee (17):
  commit-graph: fix the_repository reference
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: remove Future Work section
  commit-graph: create write_commit_graph_context
  commit-graph: extract fill_oids_from_packs()
  commit-graph: extract fill_oids_from_commit_hex()
  commit-graph: extract fill_oids_from_all_packs()
  commit-graph: extract count_distinct_commits()
  commit-graph: extract copy_oids_to_commits()
  commit-graph: extract write_commit_graph_file()
  Documentation: describe split commit-graphs
  commit-graph: lay groundwork for incremental files
  commit-graph: load split commit-graph files
  commit-graph: write split commit-graph files
  commit-graph: add --split option
  fetch: add fetch.writeCommitGraph config setting

 Documentation/technical/commit-graph.txt | 157 +++-
 builtin/commit-graph.c                   |  31 +-
 builtin/commit.c                         |   5 +-
 builtin/fetch.c                          |  17 +
 builtin/gc.c                             |   7 +-
 commit-graph.c                           | 946 ++++++++++++++++-------
 commit-graph.h                           |  19 +-
 commit.c                                 |   2 +-
 t/t5318-commit-graph.sh                  |  28 +-
 9 files changed, 887 insertions(+), 325 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-184%2Fderrickstolee%2Fgraph%2Fincremental-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-184/derrickstolee/graph/incremental-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/184
-- 
gitgitgadget

             reply	other threads:[~2019-05-08 15:53 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-08 15:53 Derrick Stolee via GitGitGadget [this message]
2019-05-08 15:53 ` [PATCH 01/17] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 02/17] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 03/17] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 04/17] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 05/17] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 06/17] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 07/17] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 08/17] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 09/17] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 10/17] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 11/17] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 12/17] Documentation: describe split commit-graphs Derrick Stolee via GitGitGadget
2019-05-08 17:20   ` SZEDER Gábor
2019-05-08 19:00     ` Derrick Stolee
2019-05-08 20:11       ` Ævar Arnfjörð Bjarmason
2019-05-09  4:49         ` Junio C Hamano
2019-05-09 12:25           ` Derrick Stolee
2019-05-09 13:45         ` Derrick Stolee
2019-05-09 15:48           ` Ævar Arnfjörð Bjarmason
2019-05-09 17:08             ` Derrick Stolee
2019-05-09 21:45               ` Ævar Arnfjörð Bjarmason
2019-05-10 12:44                 ` Derrick Stolee
2019-05-08 15:53 ` [PATCH 13/17] commit-graph: lay groundwork for incremental files Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 14/17] commit-graph: load split commit-graph files Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 15/17] commit-graph: write " Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 16/17] commit-graph: add --split option Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 17/17] fetch: add fetch.writeCommitGraph config setting Derrick Stolee via GitGitGadget
2019-05-09  8:07   ` Ævar Arnfjörð Bjarmason
2019-05-09 14:21     ` Derrick Stolee
2019-05-08 19:27 ` [PATCH 00/17] [RFC] Commit-graph: Write incremental files Ævar Arnfjörð Bjarmason
2019-05-22 19:53 ` [PATCH v2 00/11] " Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 01/11] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 02/11] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 03/11] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 05/11] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 04/11] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 06/11] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 07/11] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 08/11] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-05-27 11:28     ` SZEDER Gábor
2019-05-22 19:53   ` [PATCH v2 09/11] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-05-23  0:43     ` Ævar Arnfjörð Bjarmason
2019-05-23 13:00       ` Derrick Stolee
2019-05-22 19:53   ` [PATCH v2 10/11] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 11/11] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-03 16:03   ` [PATCH v3 00/14] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 01/14] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-05 17:22       ` Junio C Hamano
2019-06-05 18:09         ` Derrick Stolee
2019-06-06 12:10       ` Philip Oakley
2019-06-06 17:09         ` Derrick Stolee
2019-06-06 21:59           ` Philip Oakley
2019-06-03 16:03     ` [PATCH v3 02/14] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 03/14] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 04/14] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 06/14] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 05/14] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 07/14] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 08/14] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 09/14] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 10/14] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 11/14] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 13/14] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 12/14] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 14/14] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-06 14:15     ` [PATCH v4 00/14] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 01/14] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 02/14] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-06 15:19         ` Philip Oakley
2019-06-06 21:28         ` Junio C Hamano
2019-06-07 12:44           ` Derrick Stolee
2019-06-06 14:15       ` [PATCH v4 03/14] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 04/14] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 22:20         ` Junio C Hamano
2019-06-07 12:53           ` Derrick Stolee
2019-06-06 14:15       ` [PATCH v4 05/14] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-07 18:15         ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 06/14] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-07 18:23         ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 08/14] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-07 21:57         ` Junio C Hamano
2019-06-11 12:51           ` Derrick Stolee
2019-06-11 19:45             ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 07/14] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 09/14] commit-graph: merge " Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 10/14] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-06 17:00         ` Philip Oakley
2019-06-06 14:15       ` [PATCH v4 11/14] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 12/14] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-06 18:41         ` Ramsay Jones
2019-06-06 14:15       ` [PATCH v4 13/14] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 14/14] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-06 16:57       ` [PATCH v4 00/14] Commit-graph: Write incremental files Junio C Hamano
2019-06-07 12:37         ` Derrick Stolee
2019-06-07 18:38       ` [PATCH v5 00/16] " Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 01/16] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 02/16] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 04/16] commit-graph: load " Derrick Stolee via GitGitGadget
2019-06-10 21:47           ` Junio C Hamano
2019-06-10 23:41             ` Derrick Stolee
2019-06-07 18:38         ` [PATCH v5 03/16] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 05/16] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 07/16] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 06/16] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 08/16] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 09/16] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 10/16] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 11/16] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 12/16] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 13/16] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 14/16] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 15/16] commit-graph: test octopus merges with --split Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 16/16] commit-graph: test --split across alternate without --split Derrick Stolee via GitGitGadget
2019-06-17 15:02         ` [PATCH] commit-graph: normalize commit-graph filenames Derrick Stolee
2019-06-17 15:07           ` Derrick Stolee
2019-06-17 18:07           ` [PATCH v2] " Derrick Stolee
2019-06-18 18:14         ` [PATCH v6 00/18] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 01/18] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 03/18] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 02/18] commit-graph: prepare for commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 04/18] commit-graph: load " Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 05/18] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 06/18] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 07/18] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 08/18] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 09/18] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 10/18] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 11/18] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 12/18] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 13/18] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 14/18] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 15/18] commit-graph: test octopus merges with --split Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 16/18] commit-graph: test --split across alternate without --split Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 17/18] commit-graph: normalize commit-graph filenames Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 18/18] commit-graph: test verify across alternates Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.184.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    --subject='Re: [PATCH 00/17] [RFC] Commit-graph: Write incremental files' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).