From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Derrick Stolee" <stolee@gmail.com>,
"Git mailing list" <git@vger.kernel.org>,
"Stefan Beller" <sbeller@google.com>,
"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
git@jeffhostetler.com, "Derrick Stolee" <dstolee@microsoft.com>
Subject: Re: [PATCH v6 00/14] Serialized Git Commit Graph
Date: Fri, 16 Mar 2018 16:49:45 -0400 [thread overview]
Message-ID: <20180316204945.GA12333@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq370z3m5o.fsf@gitster-ct.c.googlers.com>
On Fri, Mar 16, 2018 at 11:33:55AM -0700, Junio C Hamano wrote:
> It is not so surprising that history walking runs rings around
> enumerating objects in packfiles, if packfiles are built well.
>
> A well-built packfile tends to has newer objects in base form and
> has delta that goes in backward direction (older objects are
> represented as delta against newer ones). This helps warlking from
> the tips of the history quite a bit, because your delta base cache
> will tend to have the base object (i.e. objects in the newer part of
> the history you just walked) that will be required to access the
> "next" older part of the history more often than not.
>
> Trying to read the objects in the pack in their object name order
> would essentially mean reading them in a cryptgraphically random
> order. Half the time you will end up wanting to access an object
> that is near the tip of a very deep delta chain even before you've
> accessed any of the base objects in the delta chain.
I coincidentally was doing some experiments in this area a few weeks
ago, and found a few things:
1. The ordering makes a _huge_ difference for accessing trees and
blobs.
2. Pack order (not pack-idx order) is actually the best order, since
it tends to follow the delta patterns (it's close to traversal
order, but packs delta families more tightly).
3. None of this really matters for commits, since we almost never
store them as deltas anyway.
Here are a few experiments people can do themselves to demonstrate (my
numbers here are all from linux.git, which is sort of a wort-case
for bad ordering because its size stresses the default delta cache):
[every object in sha1 order: slow]
$ time git cat-file --batch-all-objects --batch >/dev/null
real 8m44.041s
user 8m31.359s
sys 0m12.262s
[every object from a traversal: faster, but --objects traversals are
actually CPU heavy due to all of the hash lookups for each tree. Note
not just wall-clock time but the CPU since it's split across two
processes]
$ time git rev-list --objects --all |
cut -d' ' -f2 |
git cat-file --batch >/dev/null
real 1m2.667s
user 0m58.537s
sys 0m32.392s
[every object in pack order: fastest. This is due to skipping the
traversal overhead, and should use our delta cache quite efficiently.
I'm assuming a single pack and no loose objects here, but the
performance should generalize since accessing the "big" pack
dominates]
$ time git show-index <$(ls .git/objects/pack/*.idx) |
sort -n |
cut -d' ' -f2 |
git cat-file --batch >/dev/null
real 0m51.718s
user 0m50.963s
sys 0m7.068s
[just commits, sha1 order: not horrible]
$ time git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectname)' |
grep ^commit |
cut -d' ' -f2 |
git cat-file --batch >/dev/null
real 0m8.115s
user 0m14.033s
sys 0m1.170s
[just commits, pack order: slightly worse due to the extra piping, but
obviously that could be done more quickly internally]
$ time git show-index <$(ls .git/objects/pack/*.idx) |
sort -n |
cut -d' ' -f2 |
git cat-file --batch-check='%(objecttype) %(objectname)' |
grep ^commit |
cut -d' ' -f2 |
git cat-file --batch >/dev/null
real 0m21.670s
user 0m24.867s
sys 0m9.600s
[and the reason is that hardly any commits get deltas]
$ git cat-file --batch-all-objects --batch-check='%(objecttype) %(deltabase)' |
grep ^commit >commits
$ wc -l commits
692596
$ grep -v '0000000000000000000000000000000000000000' commits | wc -l
18856
For the purposes of this patch series, I don't think the order matters
much, since we're only dealing with commits. For doing --batch-check, I
think the sha1 ordering given by "cat-file --batch-all-objects" is
convenient, and doesn't have a big impact on performance. But it's
_awful_ for --batch. I think we may want to add a sorting option to just
return the objects in the original packfile order.
-Peff
next prev parent reply other threads:[~2018-03-16 20:49 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-27 2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
2018-02-27 2:32 ` [PATCH v5 01/13] commit-graph: add format document Derrick Stolee
2018-02-27 2:32 ` [PATCH v5 02/13] graph: add commit graph design document Derrick Stolee
2018-02-27 2:32 ` [PATCH v5 03/13] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-02-27 2:32 ` [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag Derrick Stolee
2018-03-12 13:55 ` Derrick Stolee
2018-03-13 21:42 ` Junio C Hamano
2018-03-14 2:26 ` Derrick Stolee
2018-03-14 17:00 ` Junio C Hamano
2018-02-27 2:32 ` [PATCH v5 05/13] commit-graph: implement write_commit_graph() Derrick Stolee
2018-02-27 2:33 ` [PATCH v5 06/13] commit-graph: implement 'git-commit-graph write' Derrick Stolee
2018-02-27 2:33 ` [PATCH v5 07/13] commit-graph: implement git commit-graph read Derrick Stolee
2018-02-27 2:33 ` [PATCH v5 08/13] commit-graph: add core.commitGraph setting Derrick Stolee
2018-02-27 2:33 ` [PATCH v5 09/13] commit-graph: close under reachability Derrick Stolee
2018-02-27 2:33 ` [PATCH v5 10/13] commit: integrate commit graph with commit parsing Derrick Stolee
2018-02-27 2:33 ` [PATCH v5 11/13] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-02-27 20:15 ` Stefan Beller
2018-02-27 2:33 ` [PATCH v5 12/13] commit-graph: build graph from starting commits Derrick Stolee
2018-02-27 2:33 ` [PATCH v5 13/13] commit-graph: implement "--additive" option Derrick Stolee
2018-02-27 18:50 ` [PATCH v5 00/13] Serialized Git Commit Graph Stefan Beller
2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 03/14] commit-graph: add format document Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 04/14] graph: add commit graph design document Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write' Derrick Stolee
2018-03-18 13:25 ` Ævar Arnfjörð Bjarmason
2018-03-19 13:12 ` Derrick Stolee
2018-03-19 14:36 ` Ævar Arnfjörð Bjarmason
2018-03-19 18:27 ` Derrick Stolee
2018-03-19 18:48 ` Ævar Arnfjörð Bjarmason
2018-03-14 19:27 ` [PATCH v6 08/14] commit-graph: implement git commit-graph read Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 10/14] commit-graph: close under reachability Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-03-15 22:50 ` SZEDER Gábor
2018-03-19 13:13 ` Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 13/14] commit-graph: build graph from starting commits Derrick Stolee
2018-03-14 19:27 ` [PATCH v6 14/14] commit-graph: implement "--additive" option Derrick Stolee
2018-03-14 20:10 ` [PATCH v6 00/14] Serialized Git Commit Graph Ramsay Jones
2018-03-14 20:43 ` Junio C Hamano
2018-03-15 17:23 ` Johannes Schindelin
2018-03-15 18:41 ` Junio C Hamano
2018-03-15 21:51 ` Ramsay Jones
2018-03-16 11:50 ` Johannes Schindelin
2018-03-16 17:27 ` Junio C Hamano
2018-03-19 11:41 ` Johannes Schindelin
2018-03-16 16:28 ` Lars Schneider
2018-03-19 13:10 ` Derrick Stolee
2018-03-16 15:06 ` Ævar Arnfjörð Bjarmason
2018-03-16 16:38 ` SZEDER Gábor
2018-03-16 18:33 ` Junio C Hamano
2018-03-16 19:48 ` SZEDER Gábor
2018-03-16 20:06 ` Jeff King
2018-03-16 20:19 ` Jeff King
2018-03-19 12:55 ` Derrick Stolee
2018-03-20 1:17 ` Derrick Stolee
2018-03-16 20:49 ` Jeff King [this message]
2018-04-02 20:34 ` [PATCH v7 " Derrick Stolee
2018-04-02 20:34 ` [PATCH v7 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
2018-04-02 20:34 ` [PATCH v7 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
2018-04-07 22:59 ` Jakub Narebski
2018-04-02 20:34 ` [PATCH v7 03/14] commit-graph: add format document Derrick Stolee
2018-04-07 23:49 ` Jakub Narebski
2018-04-02 20:34 ` [PATCH v7 04/14] graph: add commit graph design document Derrick Stolee
2018-04-08 11:06 ` Jakub Narebski
2018-04-02 20:34 ` [PATCH v7 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-04-02 20:34 ` [PATCH v7 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
2018-04-02 20:34 ` [PATCH v7 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
2018-04-08 11:59 ` Jakub Narebski
2018-04-02 20:34 ` [PATCH v7 08/14] commit-graph: implement git commit-graph read Derrick Stolee
2018-04-02 21:33 ` Junio C Hamano
2018-04-03 11:49 ` Derrick Stolee
2018-04-08 12:59 ` Jakub Narebski
2018-04-02 20:34 ` [PATCH v7 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
2018-04-08 13:39 ` Jakub Narebski
2018-04-02 20:34 ` [PATCH v7 10/14] commit-graph: close under reachability Derrick Stolee
2018-04-02 20:34 ` [PATCH v7 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-04-02 20:34 ` [PATCH v7 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-04-02 20:34 ` [PATCH v7 13/14] commit-graph: build graph from starting commits Derrick Stolee
2018-04-08 13:50 ` Jakub Narebski
2018-04-02 20:34 ` [PATCH v7 14/14] commit-graph: implement "--additive" option Derrick Stolee
2018-04-05 8:27 ` SZEDER Gábor
2018-04-10 12:55 ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
2018-04-10 12:55 ` [PATCH v8 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
2018-04-10 12:55 ` [PATCH v8 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
2018-04-10 12:55 ` [PATCH v8 03/14] commit-graph: add format document Derrick Stolee
2018-04-10 19:10 ` Stefan Beller
2018-04-10 19:18 ` Derrick Stolee
2018-04-11 20:58 ` Jakub Narebski
2018-04-12 11:28 ` Derrick Stolee
2018-04-13 22:07 ` Jakub Narebski
2018-04-10 12:55 ` [PATCH v8 04/14] graph: add commit graph design document Derrick Stolee
2018-04-15 22:48 ` Jakub Narebski
2018-04-10 12:55 ` [PATCH v8 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-04-10 12:56 ` [PATCH v8 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
2018-04-10 12:56 ` [PATCH v8 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
2018-04-10 12:56 ` [PATCH v8 08/14] commit-graph: implement git commit-graph read Derrick Stolee
2018-04-14 22:15 ` Jakub Narebski
2018-04-15 3:26 ` Eric Sunshine
2018-04-10 12:56 ` [PATCH v8 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
2018-04-14 18:33 ` Jakub Narebski
2018-04-10 12:56 ` [PATCH v8 10/14] commit-graph: close under reachability Derrick Stolee
2018-04-10 12:56 ` [PATCH v8 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-04-10 12:56 ` [PATCH v8 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-04-10 12:56 ` [PATCH v8 13/14] commit-graph: build graph from starting commits Derrick Stolee
2018-04-10 12:56 ` [PATCH v8 14/14] commit-graph: implement "--append" option Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180316204945.GA12333@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ramsay@ramsayjones.plus.com \
--cc=sbeller@google.com \
--cc=stolee@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).