From: Derrick Stolee <stolee@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, git@jeffhostetler.com,
sbeller@google.com, dstolee@microsoft.com
Subject: [PATCH 00/14] Serialized Commit Graph
Date: Thu, 25 Jan 2018 09:02:17 -0500 [thread overview]
Message-ID: <20180125140231.65604-1-dstolee@microsoft.com> (raw)
As promised [1], this patch contains a way to serialize the commit graph.
The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:
* 'git log --topo-order -1000' walks all reachable commits to avoid
incorrect topological orders, but only needs the commit message for
the top 1000 commits.
* 'git merge-base <A> <B>' may walk many commits to find the correct
boundary between the commits reachable from A and those reachable
from B. No commit messages are needed.
* 'git branch -vv' checks ahead/behind status for all local branches
compared to their upstream remote branches. This is essentially as
hard as computing merge bases for each.
The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.
The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.
Here are some performance results for a copy of the Linux repository
where 'master' has 704,766 reachable commits and is behind 'origin/master'
by 19,610 commits.
| Command | Before | After | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 | 5.9s | 0.7s | -88% |
| branch -vv | 0.42s | 0.27s | -35% |
| rev-list --all | 6.4s | 1.0s | -84% |
| rev-list --all --objects | 32.6s | 27.6s | -15% |
To test this yourself, run the following on your repo:
git config core.graph true
git show-ref -s | git graph --write --update-head
The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisions by toggling the 'core.graph' setting.
[1] https://public-inbox.org/git/d154319e-bb9e-b300-7c37-27b1dcd2a2ce@jeffhostetler.com/
Re: What's cooking in git.git (Jan 2018, #03; Tue, 23)
[2] https://github.com/derrickstolee/git/pull/2
A GitHub pull request containing the latest version of this patch.
P.S. I'm sending this patch from my gmail address to avoid Outlook
munging the URLs included in the design document.
Derrick Stolee (14):
graph: add packed graph design document
packed-graph: add core.graph setting
packed-graph: create git-graph builtin
packed-graph: add format document
packed-graph: implement construct_graph()
packed-graph: implement git-graph --write
packed-graph: implement git-graph --read
graph: implement git-graph --update-head
packed-graph: implement git-graph --clear
packed-graph: teach git-graph --delete-expired
commit: integrate packed graph with commit parsing
packed-graph: read only from specific pack-indexes
packed-graph: close under reachability
packed-graph: teach git-graph to read commits
Documentation/config.txt | 3 +
Documentation/git-graph.txt | 102 ++++
Documentation/technical/graph-format.txt | 88 ++++
Documentation/technical/packed-graph.txt | 185 +++++++
Makefile | 2 +
alloc.c | 1 +
builtin.h | 1 +
builtin/graph.c | 231 +++++++++
cache.h | 1 +
command-list.txt | 1 +
commit.c | 20 +-
commit.h | 2 +
config.c | 5 +
environment.c | 1 +
git.c | 1 +
log-tree.c | 3 +-
packed-graph.c | 840 +++++++++++++++++++++++++++++++
packed-graph.h | 65 +++
packfile.c | 4 +-
packfile.h | 2 +
t/t5319-graph.sh | 271 ++++++++++
21 files changed, 1822 insertions(+), 7 deletions(-)
create mode 100644 Documentation/git-graph.txt
create mode 100644 Documentation/technical/graph-format.txt
create mode 100644 Documentation/technical/packed-graph.txt
create mode 100644 builtin/graph.c
create mode 100644 packed-graph.c
create mode 100644 packed-graph.h
create mode 100755 t/t5319-graph.sh
--
2.16.0
next reply other threads:[~2018-01-25 14:02 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-25 14:02 Derrick Stolee [this message]
2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
2018-01-25 20:04 ` Stefan Beller
2018-01-26 12:49 ` Derrick Stolee
2018-01-26 18:17 ` Stefan Beller
2018-01-25 21:14 ` Junio C Hamano
2018-01-26 13:06 ` Derrick Stolee
2018-01-26 14:13 ` Duy Nguyen
2018-01-25 14:02 ` [PATCH 02/14] packed-graph: add core.graph setting Derrick Stolee
2018-01-25 20:17 ` Stefan Beller
2018-01-25 20:40 ` Derrick Stolee
2018-01-25 21:43 ` Junio C Hamano
2018-01-26 13:08 ` Derrick Stolee
2018-01-25 14:02 ` [PATCH 03/14] packed-graph: create git-graph builtin Derrick Stolee
2018-01-25 21:45 ` Stefan Beller
2018-01-26 13:13 ` Derrick Stolee
2018-01-25 23:01 ` Junio C Hamano
2018-01-26 13:14 ` Derrick Stolee
2018-01-26 14:16 ` Duy Nguyen
2018-01-25 14:02 ` [PATCH 04/14] packed-graph: add format document Derrick Stolee
2018-01-25 22:06 ` Junio C Hamano
2018-01-25 22:18 ` Stefan Beller
2018-01-25 22:29 ` Junio C Hamano
2018-01-26 13:22 ` Derrick Stolee
2018-01-25 22:07 ` Stefan Beller
2018-01-26 13:25 ` Derrick Stolee
2018-01-25 14:02 ` [PATCH 05/14] packed-graph: implement construct_graph() Derrick Stolee
2018-01-25 23:21 ` Stefan Beller
2018-01-26 20:47 ` Junio C Hamano
2018-01-26 20:55 ` Junio C Hamano
2018-01-26 21:14 ` Andreas Schwab
2018-01-26 22:04 ` Junio C Hamano
2018-01-25 14:02 ` [PATCH 06/14] packed-graph: implement git-graph --write Derrick Stolee
2018-01-25 23:28 ` Stefan Beller
2018-01-26 13:28 ` Derrick Stolee
2018-01-25 14:02 ` [PATCH 07/14] packed-graph: implement git-graph --read Derrick Stolee
2018-01-25 14:02 ` [PATCH 08/14] graph: implement git-graph --update-head Derrick Stolee
2018-01-25 14:02 ` [PATCH 09/14] packed-graph: implement git-graph --clear Derrick Stolee
2018-01-25 23:35 ` Stefan Beller
2018-01-25 14:02 ` [PATCH 10/14] packed-graph: teach git-graph --delete-expired Derrick Stolee
2018-01-25 14:02 ` [PATCH 11/14] commit: integrate packed graph with commit parsing Derrick Stolee
2018-01-26 19:38 ` Stefan Beller
2018-01-25 14:02 ` [PATCH 12/14] packed-graph: read only from specific pack-indexes Derrick Stolee
2018-01-25 14:02 ` [PATCH 13/14] packed-graph: close under reachability Derrick Stolee
2018-01-25 14:02 ` [PATCH 14/14] packed-graph: teach git-graph to read commits Derrick Stolee
2018-01-25 15:46 ` [PATCH 00/14] Serialized Commit Graph Ævar Arnfjörð Bjarmason
2018-01-25 16:09 ` Derrick Stolee
2018-01-25 23:06 ` Ævar Arnfjörð Bjarmason
2018-01-26 12:15 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180125140231.65604-1-dstolee@microsoft.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).