From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: peff@peff.net, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v4 0/7] Use generation numbers for --topo-order
Date: Tue, 16 Oct 2018 15:36:35 -0700 (PDT) [thread overview]
Message-ID: <pull.25.v4.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.25.v3.git.gitgitgadget@gmail.com>
This patch series performs a decently-sized refactoring of the revision-walk
machinery. Well, "refactoring" is probably the wrong word, as I don't
actually remove the old code. Instead, when we see certain options in the
'rev_info' struct, we redirect the commit-walk logic to a new set of methods
that distribute the workload differently. By using generation numbers in the
commit-graph, we can significantly improve 'git log --graph' commands (and
the underlying 'git rev-list --topo-order').
On the Linux repository, I got the following performance results when
comparing to the previous version with or without a commit-graph:
Test: git rev-list --topo-order -100 HEAD
HEAD~1, no commit-graph: 6.80 s
HEAD~1, w/ commit-graph: 0.77 s
HEAD, w/ commit-graph: 0.02 s
Test: git rev-list --topo-order -100 HEAD -- tools
HEAD~1, no commit-graph: 9.63 s
HEAD~1, w/ commit-graph: 6.06 s
HEAD, w/ commit-graph: 0.06 s
If you want to read this series but are unfamiliar with the commit-graph and
generation numbers, then I recommend reading
Documentation/technical/commit-graph.txt or a blob post [1] I wrote on the
subject. In particular, the three-part walk described in "revision.c:
refactor basic topo-order logic" is present (but underexplained) as an
animated PNG [2].
Since revision.c is an incredibly important (and old) portion of the
codebase -- and because there are so many orthogonal options in 'struct
rev_info' -- I consider this submission to be "RFC quality". That is, I am
not confident that I am not missing anything, or that my solution is the
best it can be. I did merge this branch with ds/commit-graph-with-grafts and
the "DO-NOT-MERGE: write and read commit-graph always" commit that computes
a commit-graph with every 'git commit' command. The test suite passed with
that change, available on GitHub [3]. To ensure that I cover at least the
case I think are interesting, I added tests to t6600-test-reach.sh to verify
the walks report the correct results for the three cases there (no
commit-graph, full commit-graph, and a partial commit-graph so the walk
starts at GENERATION_NUMBER_INFINITY).
One notable case that is not included in this series is the case of a
history comparison such as 'git rev-list --topo-order A..B'. The existing
code in limit_list() has ways to cut the walk short when all pending commits
are UNINTERESTING. Since this code depends on commit_list instead of the
prio_queue we are using here, I chose to leave it untouched for now. We can
revisit it in a separate series later. Since handle_commit() turns on
revs->limited when a commit is UNINTERESTING, we do not hit the new code in
this case. Removing this 'revs->limited = 1;' line yields correct results,
but the performance is worse.
This series was based on ds/reachable, but is now based on 'master' to not
conflict with 182070 "commit: use timestamp_t for author_date_slab". There
is a small conflict with md/filter-trees, because it renamed a flag in
revisions.h in the line before I add new flags. Hopefully this conflict is
not too difficult to resolve.
Changes in V3: I added a new patch that updates the tab-alignment for flags
in revision.h before adding new ones (Thanks, Ævar!). Also, I squashed the
recommended changes to run_three_modes and test_three_modes from Szeder and
Junio. Thanks!
Changes in V4: I'm sending a V4 to respond to the feedback so far. Still
looking forward to more on the really big commit!
* Removed the whitespace changes to the flags in revision.c that caused
merge pain.
* The prio-queue peek function is now covered by tests when in "stack"
mode.
* The "add_parents_to_list()" function is now renamed to
"process_parents()"
* Added a new commit that expands test coverage with alternate orders and
file history (use GIT_TEST_COMMIT_GRAPH to have
t6012-rev-list-simplify.sh cover the new logic). These tests found a
problem with author dates (I forgot to record them during the explore
walk).
* Commit message edits.
Thanks, -Stolee
[1]
https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/
Supercharging the Git Commit Graph III: Generations and Graph Algorithms
[2]
https://msdnshared.blob.core.windows.net/media/2018/06/commit-graph-topo-order-b-a.png
Animation showing three-part walk
[3] https://github.com/derrickstolee/git/tree/topo-order/testA branch
containing this series along with commits to compute commit-graph in entire
test suite.
Cc: avarab@gmail.comCc: szeder.dev@gmail.com
Derrick Stolee (7):
prio-queue: add 'peek' operation
test-reach: add run_three_modes method
test-reach: add rev-list tests
revision.c: begin refactoring --topo-order logic
commit/revisions: bookkeeping before refactoring
revision.c: generation-based topo-order algorithm
t6012: make rev-list tests more interesting
commit.c | 11 +-
commit.h | 8 ++
object.h | 4 +-
prio-queue.c | 9 ++
prio-queue.h | 6 +
revision.c | 245 +++++++++++++++++++++++++++++++++--
revision.h | 6 +
t/helper/test-prio-queue.c | 26 ++--
t/t0009-prio-queue.sh | 14 ++
t/t6012-rev-list-simplify.sh | 45 +++++--
t/t6600-test-reach.sh | 96 +++++++++++++-
11 files changed, 430 insertions(+), 40 deletions(-)
base-commit: 2d3b1c576c85b7f5db1f418907af00ab88e0c303
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-25%2Fderrickstolee%2Ftopo-order%2Fprogress-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-25/derrickstolee/topo-order/progress-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/25
Range-diff vs v3:
1: cc1ec4c270 ! 1: 2358cfd5ed prio-queue: add 'peek' operation
@@ -8,7 +8,9 @@
add it as prio_queue_peek().
Add a reference-level comparison in t/helper/test-prio-queue.c
- so this method is exercised by t0009-prio-queue.sh.
+ so this method is exercised by t0009-prio-queue.sh. Further, add
+ a test that checks the behavior when the compare function is NULL
+ (i.e. the queue becomes a stack).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@@ -56,6 +58,11 @@
- if (!strcmp(*argv, "get"))
- show(prio_queue_get(&pq));
- else if (!strcmp(*argv, "dump")) {
+- int *v;
+- while ((v = prio_queue_get(&pq)))
+- show(v);
+- }
+- else {
+ if (!strcmp(*argv, "get")) {
+ void *peek = prio_queue_peek(&pq);
+ void *get = prio_queue_get(&pq);
@@ -63,6 +70,40 @@
+ BUG("peek and get results do not match");
+ show(get);
+ } else if (!strcmp(*argv, "dump")) {
- int *v;
- while ((v = prio_queue_get(&pq)))
- show(v);
++ void *peek;
++ void *get;
++ while ((peek = prio_queue_peek(&pq))) {
++ get = prio_queue_get(&pq);
++ if (peek != get)
++ BUG("peek and get results do not match");
++ show(get);
++ }
++ } else if (!strcmp(*argv, "stack")) {
++ pq.compare = NULL;
++ } else {
+ int *v = malloc(sizeof(*v));
+ *v = atoi(*argv);
+ prio_queue_put(&pq, v);
+
+diff --git a/t/t0009-prio-queue.sh b/t/t0009-prio-queue.sh
+--- a/t/t0009-prio-queue.sh
++++ b/t/t0009-prio-queue.sh
+@@
+ test_cmp expect actual
+ '
+
++cat >expect <<'EOF'
++3
++2
++6
++4
++5
++1
++8
++EOF
++test_expect_success 'stack order' '
++ test-tool prio-queue stack 8 1 5 4 6 2 3 dump >actual &&
++ test_cmp expect actual
++'
++
+ test_done
2: b2a1ade148 = 2: 3a4b68e479 test-reach: add run_three_modes method
3: b0ceb96076 = 3: 12a3f6d367 test-reach: add rev-list tests
4: fd1a0ab7cd = 4: cd9eef9688 revision.c: begin refactoring --topo-order logic
5: e86f304082 ! 5: f3e291665d commit/revisions: bookkeeping before refactoring
@@ -16,7 +16,11 @@
around the UNINTERESTING flag and other special cases depending
on the struct rev_info. Allow this method to ignore a NULL 'list'
parameter, as we will not be populating the list for our walk.
+ Also rename the method to the slightly more generic name
+ process_parents() to make clear that this method does more than
+ add to a list (and no list is required anymore).
+ Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
diff --git a/commit.c b/commit.c
@@ -28,7 +32,8 @@
-/* record author-date for each commit object */
-define_commit_slab(author_date_slab, timestamp_t);
--
++implement_shared_commit_slab(author_date_slab, timestamp_t);
+
-static void record_author_date(struct author_date_slab *author_date,
- struct commit *commit)
+void record_author_date(struct author_date_slab *author_date,
@@ -64,7 +69,7 @@
extern int check_commit_signature(const struct commit *commit, struct signature_check *sigc);
+/* record author-date for each commit object */
-+define_commit_slab(author_date_slab, timestamp_t);
++define_shared_commit_slab(author_date_slab, timestamp_t);
+
+void record_author_date(struct author_date_slab *author_date,
+ struct commit *commit);
@@ -77,6 +82,17 @@
diff --git a/revision.c b/revision.c
--- a/revision.c
+++ b/revision.c
+@@
+ *cache = new_entry;
+ }
+
+-static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
+- struct commit_list **list, struct commit_list **cache_ptr)
++static int process_parents(struct rev_info *revs, struct commit *commit,
++ struct commit_list **list, struct commit_list **cache_ptr)
+ {
+ struct commit_list *parent = commit->parents;
+ unsigned left_flag;
@@
if (p->object.flags & SEEN)
continue;
@@ -97,3 +113,39 @@
}
if (revs->first_parent_only)
break;
+@@
+
+ if (revs->max_age != -1 && (commit->date < revs->max_age))
+ obj->flags |= UNINTERESTING;
+- if (add_parents_to_list(revs, commit, &list, NULL) < 0)
++ if (process_parents(revs, commit, &list, NULL) < 0)
+ return -1;
+ if (obj->flags & UNINTERESTING) {
+ mark_parents_uninteresting(commit);
+@@
+
+ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
+ {
+- if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
++ if (process_parents(revs, commit, &revs->commits, NULL) < 0) {
+ if (!revs->ignore_missing_links)
+ die("Failed to traverse parents of commit %s",
+ oid_to_hex(&commit->object.oid));
+@@
+ for (;;) {
+ struct commit *p = *pp;
+ if (!revs->limited)
+- if (add_parents_to_list(revs, p, &revs->commits, &cache) < 0)
++ if (process_parents(revs, p, &revs->commits, &cache) < 0)
+ return rewrite_one_error;
+ if (p->object.flags & UNINTERESTING)
+ return rewrite_one_ok;
+@@
+ try_to_simplify_commit(revs, commit);
+ else if (revs->topo_walk_info)
+ expand_topo_walk(revs, commit);
+- else if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
++ else if (process_parents(revs, commit, &revs->commits, NULL) < 0) {
+ if (!revs->ignore_missing_links)
+ die("Failed to traverse parents of commit %s",
+ oid_to_hex(&commit->object.oid));
6: fa6d5ef152 < -: ---------- revision.h: add whitespace in flag definitions
7: 020b2f50c5 ! 6: aa0bb2221d revision.c: refactor basic topo-order logic
@@ -1,6 +1,15 @@
Author: Derrick Stolee <dstolee@microsoft.com>
- revision.c: refactor basic topo-order logic
+ revision.c: generation-based topo-order algorithm
+
+ The current --topo-order algorithm requires walking all
+ reachable commits up front, topo-sorting them, all before
+ outputting the first value. This patch introduces a new
+ algorithm which uses stored generation numbers to
+ incrementally walk in topo-order, outputting commits as
+ we go. This can dramatically reduce the computation time
+ to write a fixed number of commits, such as when limiting
+ with "-n <N>" or filling the first page of a pager.
When running a command like 'git rev-list --topo-order HEAD',
Git performed the following steps:
@@ -139,11 +148,12 @@
frequently, including by merge commits. A less-frequently-changed
path (such as 'README') has similar end-to-end time since we need
to walk the same number of commits (before determining we do not
- have 100 hits). However, get get the benefit that the output is
+ have 100 hits). However, get the benefit that the output is
presented to the user as it is discovered, much the same as a
normal 'git log' command (no '--topo-order'). This is an improved
user experience, even if the command has the same runtime.
+ Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
diff --git a/object.h b/object.h
@@ -216,10 +226,13 @@
+ if (parse_commit_gently(c, 1) < 0)
+ return;
+
++ if (revs->sort_order == REV_SORT_BY_AUTHOR_DATE)
++ record_author_date(&info->author_date, c);
++
+ if (revs->max_age != -1 && (c->date < revs->max_age))
+ c->object.flags |= UNINTERESTING;
+
-+ if (add_parents_to_list(revs, c, NULL, NULL) < 0)
++ if (process_parents(revs, c, NULL, NULL) < 0)
+ return;
+
+ if (c->object.flags & UNINTERESTING)
@@ -366,10 +379,10 @@
static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
{
-- if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
+- if (process_parents(revs, commit, &revs->commits, NULL) < 0) {
+ struct commit_list *p;
+ struct topo_walk_info *info = revs->topo_walk_info;
-+ if (add_parents_to_list(revs, commit, NULL, NULL) < 0) {
++ if (process_parents(revs, commit, NULL, NULL) < 0) {
if (!revs->ignore_missing_links)
die("Failed to traverse parents of commit %s",
- oid_to_hex(&commit->object.oid));
@@ -404,9 +417,9 @@
--- a/revision.h
+++ b/revision.h
@@
- #define USER_GIVEN (1u<<25) /* given directly by the user */
- #define TRACK_LINEAR (1u<<26)
- #define ALL_REV_FLAGS (((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+ #define USER_GIVEN (1u<<25) /* given directly by the user */
+ #define TRACK_LINEAR (1u<<26)
+ #define ALL_REV_FLAGS (((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define TOPO_WALK_EXPLORED (1u<<27)
+#define TOPO_WALK_INDEGREE (1u<<28)
-: ---------- > 7: a21febe112 t6012: make rev-list tests more interesting
--
gitgitgadget
next prev parent reply other threads:[~2018-10-16 22:36 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02 ` SZEDER Gábor
2018-09-19 19:31 ` Junio C Hamano
2018-09-19 19:38 ` Junio C Hamano
2018-09-20 21:18 ` Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 5:51 ` Ævar Arnfjörð Bjarmason
2018-09-18 6:05 ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47 ` Derrick Stolee
2018-09-21 17:39 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15 ` Derrick Stolee
2018-10-11 13:54 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58 ` Jeff King
2018-10-12 4:34 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06 ` Jeff King
2018-10-12 6:33 ` Junio C Hamano
2018-10-12 12:32 ` Derrick Stolee
2018-10-12 16:15 ` Johannes Sixt
2018-10-13 8:05 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57 ` Derrick Stolee
2018-10-06 16:56 ` Jakub Narebski
2018-10-11 15:35 ` Jeff King
2018-10-11 16:21 ` Derrick Stolee
2018-10-25 9:43 ` Jeff King
2018-10-25 13:00 ` Derrick Stolee
2018-10-11 22:32 ` Stefan Beller
2018-09-21 21:22 ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36 ` Derrick Stolee via GitGitGadget [this message]
2018-10-16 22:36 ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21 ` Jakub Narebski
2018-10-21 15:28 ` Derrick Stolee
2018-10-16 22:36 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55 ` Jakub Narebski
2018-10-22 1:12 ` Junio C Hamano
2018-10-22 1:51 ` Derrick Stolee
2018-10-22 1:55 ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25 8:28 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37 ` Jakub Narebski
2018-10-23 13:54 ` Derrick Stolee
2018-10-26 16:55 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48 ` Jakub Narebski
2018-10-21 12:57 ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01 5:21 ` Junio C Hamano
2018-11-01 13:49 ` Derrick Stolee
2018-11-01 23:54 ` Junio C Hamano
2018-11-01 13:46 ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48 ` SZEDER Gábor
2018-11-01 16:12 ` Derrick Stolee
2019-11-08 2:50 ` Mike Hommey
2019-11-11 1:07 ` Derrick Stolee
2019-11-18 23:04 ` SZEDER Gábor
2018-11-01 13:46 ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.25.v4.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).