git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, avarab@gmail.com, szeder.dev@gmail.com,
	peff@peff.net, jnareb@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v5 0/7] Use generation numbers for --topo-order
Date: Thu,  1 Nov 2018 13:46:16 +0000	[thread overview]
Message-ID: <20181101134623.84055-1-dstolee@microsoft.com> (raw)
In-Reply-To: <pull.25.v4.git.gitgitgadget@gmail.com>

This patch series performs a decently-sized refactoring of the
revision-walk machinery. Well, "refactoring" is probably the wrong word,
as I don't actually remove the old code. Instead, when we see certain
options in the 'rev_info' struct, we redirect the commit-walk logic to
a new set of methods that distribute the workload differently. By using
generation numbers in the commit-graph, we can significantly improve
'git log --graph' commands (and the underlying 'git rev-list --topo-order').

On the Linux repository, I got the following performance results when
comparing to the previous version with or without a commit-graph:

    Test: git rev-list --topo-order -100 HEAD
    HEAD~1, no commit-graph: 6.80 s
    HEAD~1, w/ commit-graph: 0.77 s
      HEAD, w/ commit-graph: 0.02 s

    Test: git rev-list --topo-order -100 HEAD -- tools
    HEAD~1, no commit-graph: 9.63 s
    HEAD~1, w/ commit-graph: 6.06 s
      HEAD, w/ commit-graph: 0.06 s

If you want to read this series but are unfamiliar with the commit-graph
and generation numbers, then I recommend reading
`Documentation/technical/commit-graph.txt` or a blog post [1] I wrote on
the subject. In particular, the three-part walk described in "revision.c:
refactor basic topo-order logic" is present (but underexplained) as an
animated PNG [2].

**UPDATED** Now that we have had some review and some dogfooding, I'm
removing the paragraph I had here about "RFC quality". I think this is
ready to merge!

One notable case that is not included in this series is the case of a
history comparison such as 'git rev-list --topo-order A..B'. The existing
code in limit_list() has ways to cut the walk short when all pending
commits are UNINTERESTING. Since this code depends on commit_list instead
of the prio_queue we are using here, I chose to leave it untouched for now.
We can revisit it in a separate series later. Since handle_commit() turns
on revs->limited when a commit is UNINTERESTING, we do not hit the new
code in this case. Removing this 'revs->limited = 1;' line yields correct
results, but the performance can be worse.

**UPDATED** See the discussion about Generation Number V2 [4] for more
on this topic.

Changes in V5: Thanks Jakub for feedback on the huge commit! I think
I've responded to all the code feedback. See the range-diff at the
end of this cover-page.

Thanks,
-Stolee

[1] https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/
   Supercharging the Git Commit Graph III: Generations and Graph Algorithms

[2] https://msdnshared.blob.core.windows.net/media/2018/06/commit-graph-topo-order-b-a.png
    Animation showing three-part walk

[3] https://github.com/derrickstolee/git/tree/topo-order/test
    A branch containing this series along with commits to compute commit-graph in entire test suite.

[4] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/
    [RFC] Generation Number v2

Note: I'm not submitting this version via GitGitGadget because it's
currently struggling with how to handle a PR in a conflict state.
The new flags in revision.h have a conflict with recent changes in
master.

Derrick Stolee (7):
  prio-queue: add 'peek' operation
  test-reach: add run_three_modes method
  test-reach: add rev-list tests
  revision.c: begin refactoring --topo-order logic
  commit/revisions: bookkeeping before refactoring
  revision.c: generation-based topo-order algorithm
  t6012: make rev-list tests more interesting

 commit.c                     |   9 +-
 commit.h                     |   7 +
 object.h                     |   4 +-
 prio-queue.c                 |   9 ++
 prio-queue.h                 |   6 +
 revision.c                   | 243 +++++++++++++++++++++++++++++++++--
 revision.h                   |   6 +
 t/helper/test-prio-queue.c   |  26 ++--
 t/t0009-prio-queue.sh        |  14 ++
 t/t6012-rev-list-simplify.sh |  45 +++++--
 t/t6600-test-reach.sh        |  96 +++++++++++++-
 11 files changed, 426 insertions(+), 39 deletions(-)


base-commit: 2d3b1c576c85b7f5db1f418907af00ab88e0c303
-- 
2.19.1.542.gc4df23f792

-->8--

1:  2358cfd5ed = 1:  7c75a56505 prio-queue: add 'peek' operation
2:  3a4b68e479 = 2:  686c4370de test-reach: add run_three_modes method
3:  12a3f6d367 = 3:  7410c00982 test-reach: add rev-list tests
4:  cd9eef9688 = 4:  5439e11e37 revision.c: begin refactoring --topo-order logic
5:  f3e291665d ! 5:  71554deb9b commit/revisions: bookkeeping before refactoring
    @@ -9,8 +9,8 @@
            compare_commits_by_author_date() in revision.c. These are used
            currently by sort_in_topological_order() in commit.c.
     
    -    2. Moving these methods to commit.h requires adding the author_slab
    -       definition to commit.h.
    +    2. Moving these methods to commit.h requires adding an author_date_slab
    +       declaration to commit.h. Consumers will need their own implementation.
     
         3. The add_parents_to_list() method in revision.c performs logic
            around the UNINTERESTING flag and other special cases depending
    @@ -31,8 +31,7 @@
      define_commit_slab(indegree_slab, int);
      
     -/* record author-date for each commit object */
    --define_commit_slab(author_date_slab, timestamp_t);
    -+implement_shared_commit_slab(author_date_slab, timestamp_t);
    + define_commit_slab(author_date_slab, timestamp_t);
      
     -static void record_author_date(struct author_date_slab *author_date,
     -			       struct commit *commit)
    @@ -69,8 +68,7 @@
      extern int check_commit_signature(const struct commit *commit, struct signature_check *sigc);
      
     +/* record author-date for each commit object */
    -+define_shared_commit_slab(author_date_slab, timestamp_t);
    -+
    ++struct author_date_slab;
     +void record_author_date(struct author_date_slab *author_date,
     +			struct commit *commit);
     +
6:  aa0bb2221d ! 6:  84c142e0bc revision.c: generation-based topo-order algorithm
    @@ -195,6 +195,7 @@
      
     -struct topo_walk_info {};
     +define_commit_slab(indegree_slab, int);
    ++define_commit_slab(author_date_slab, timestamp_t);
     +
     +struct topo_walk_info {
     +	uint32_t min_generation;
    @@ -243,12 +244,12 @@
     +}
     +
     +static void explore_to_depth(struct rev_info *revs,
    -+			     uint32_t gen)
    ++			     uint32_t gen_cutoff)
     +{
     +	struct topo_walk_info *info = revs->topo_walk_info;
     +	struct commit *c;
     +	while ((c = prio_queue_peek(&info->explore_queue)) &&
    -+	       c->generation >= gen)
    ++	       c->generation >= gen_cutoff)
     +		explore_walk_step(revs);
     +}
     +
    @@ -266,9 +267,6 @@
     +
     +	explore_to_depth(revs, c->generation);
     +
    -+	if (parse_commit_gently(c, 1) < 0)
    -+		return;
    -+
     +	for (p = c->parents; p; p = p->next) {
     +		struct commit *parent = p->item;
     +		int *pi = indegree_slab_at(&info->indegree, parent);
    @@ -285,12 +283,13 @@
     +	}
     +}
     +
    -+static void compute_indegrees_to_depth(struct rev_info *revs)
    ++static void compute_indegrees_to_depth(struct rev_info *revs,
    ++				       uint32_t gen_cutoff)
     +{
     +	struct topo_walk_info *info = revs->topo_walk_info;
     +	struct commit *c;
     +	while ((c = prio_queue_peek(&info->indegree_queue)) &&
    -+	       c->generation >= info->min_generation)
    ++	       c->generation >= gen_cutoff)
     +		indegree_walk_step(revs);
     +}
      
    @@ -305,9 +304,9 @@
     -	limit_list(revs);
     -	sort_in_topological_order(&revs->commits, revs->sort_order);
     +	init_indegree_slab(&info->indegree);
    -+	memset(&info->explore_queue, '\0', sizeof(info->explore_queue));
    -+	memset(&info->indegree_queue, '\0', sizeof(info->indegree_queue));
    -+	memset(&info->topo_queue, '\0', sizeof(info->topo_queue));
    ++	memset(&info->explore_queue, 0, sizeof(info->explore_queue));
    ++	memset(&info->indegree_queue, 0, sizeof(info->indegree_queue));
    ++	memset(&info->topo_queue, 0, sizeof(info->topo_queue));
     +
     +	switch (revs->sort_order) {
     +	default: /* REV_SORT_IN_GRAPH_ORDER */
    @@ -329,23 +328,22 @@
     +	info->min_generation = GENERATION_NUMBER_INFINITY;
     +	for (list = revs->commits; list; list = list->next) {
     +		struct commit *c = list->item;
    -+		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
    -+		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
     +
     +		if (parse_commit_gently(c, 1))
     +			continue;
    ++
    ++		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
    ++		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
    ++
     +		if (c->generation < info->min_generation)
     +			info->min_generation = c->generation;
    -+	}
     +
    -+	for (list = revs->commits; list; list = list->next) {
    -+		struct commit *c = list->item;
     +		*(indegree_slab_at(&info->indegree, c)) = 1;
     +
     +		if (revs->sort_order == REV_SORT_BY_AUTHOR_DATE)
     +			record_author_date(&info->author_date, c);
     +	}
    -+	compute_indegrees_to_depth(revs);
    ++	compute_indegrees_to_depth(revs, info->min_generation);
     +
     +	for (list = revs->commits; list; list = list->next) {
     +		struct commit *c = list->item;
    @@ -385,9 +383,8 @@
     +	if (process_parents(revs, commit, NULL, NULL) < 0) {
      		if (!revs->ignore_missing_links)
      			die("Failed to traverse parents of commit %s",
    --			    oid_to_hex(&commit->object.oid));
    -+				oid_to_hex(&commit->object.oid));
    -+	}
    + 			    oid_to_hex(&commit->object.oid));
    + 	}
     +
     +	for (p = commit->parents; p; p = p->next) {
     +		struct commit *parent = p->item;
    @@ -398,7 +395,7 @@
     +
     +		if (parent->generation < info->min_generation) {
     +			info->min_generation = parent->generation;
    -+			compute_indegrees_to_depth(revs);
    ++			compute_indegrees_to_depth(revs, info->min_generation);
     +		}
     +
     +		pi = indegree_slab_at(&info->indegree, parent);
    @@ -409,9 +406,10 @@
     +
     +		if (revs->first_parent_only)
     +			return;
    - 	}
    ++	}
      }
      
    + int prepare_revision_walk(struct rev_info *revs)
     
      diff --git a/revision.h b/revision.h
      --- a/revision.h
7:  a21febe112 = 7:  5479087812 t6012: make rev-list tests more interesting

  parent reply	other threads:[~2018-11-01 13:46 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18  4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02     ` SZEDER Gábor
2018-09-19 19:31       ` Junio C Hamano
2018-09-19 19:38         ` Junio C Hamano
2018-09-20 21:18           ` Junio C Hamano
2018-09-18  4:08   ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  5:51     ` Ævar Arnfjörð Bjarmason
2018-09-18  6:05   ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47     ` Derrick Stolee
2018-09-21 17:39   ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15       ` Derrick Stolee
2018-10-11 13:54       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58       ` Jeff King
2018-10-12  4:34         ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06       ` Jeff King
2018-10-12  6:33       ` Junio C Hamano
2018-10-12 12:32         ` Derrick Stolee
2018-10-12 16:15         ` Johannes Sixt
2018-10-13  8:05           ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57       ` Derrick Stolee
2018-10-06 16:56         ` Jakub Narebski
2018-10-11 15:35       ` Jeff King
2018-10-11 16:21         ` Derrick Stolee
2018-10-25  9:43           ` Jeff King
2018-10-25 13:00             ` Derrick Stolee
2018-10-11 22:32       ` Stefan Beller
2018-09-21 21:22     ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21         ` Jakub Narebski
2018-10-21 15:28           ` Derrick Stolee
2018-10-16 22:36       ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55         ` Jakub Narebski
2018-10-22  1:12           ` Junio C Hamano
2018-10-22  1:51             ` Derrick Stolee
2018-10-22  1:55               ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25  8:28               ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56                 ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17         ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37         ` Jakub Narebski
2018-10-23 13:54           ` Derrick Stolee
2018-10-26 16:55             ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48         ` Jakub Narebski
2018-10-21 12:57       ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01  5:21       ` Junio C Hamano
2018-11-01 13:49         ` Derrick Stolee
2018-11-01 23:54           ` Junio C Hamano
2018-11-01 13:46       ` Derrick Stolee [this message]
2018-11-01 13:46         ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48           ` SZEDER Gábor
2018-11-01 16:12             ` Derrick Stolee
2019-11-08  2:50           ` Mike Hommey
2019-11-11  1:07             ` Derrick Stolee
2019-11-18 23:04               ` SZEDER Gábor
2018-11-01 13:46         ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181101134623.84055-1-dstolee@microsoft.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).