From: Jakub Narebski <jnareb@gmail.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic
Date: Sun, 21 Oct 2018 17:55:23 +0200 [thread overview]
Message-ID: <86h8hfguqc.fsf@gmail.com> (raw)
In-Reply-To: <cd9eef96885a811196ab0b851a98de4455b950ab.1539729393.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Tue, 16 Oct 2018 15:36:42 -0700 (PDT)")
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When running 'git rev-list --topo-order' and its kin, the topo_order
> setting in struct rev_info implies the limited setting. This means
> that the following things happen during prepare_revision_walk():
>
> * revs->limited implies we run limit_list() to walk the entire
> reachable set. There are some short-cuts here, such as if we
> perform a range query like 'git rev-list COMPARE..HEAD' and we
> can stop limit_list() when all queued commits are uninteresting.
And if revs->topo_order is set, then (with current implementation) we
need limit_list() to run to generate commit_list with commits to be
topologically sorted, which is done by setting revs->limited.
In short, with current code revs->topo_order implies revs->limited.
>
> * revs->topo_order implies we run sort_in_topological_order(). See
> the implementation of that method in commit.c. It implies that
> the full set of commits to order is in the given commit_list.
So the current code uses "generate list of commits, then sort it"
approach...
>
> These two methods imply that a 'git rev-list --topo-order HEAD'
> command must walk the entire reachable set of commits _twice_ before
> returning a single result.
>
> If we have a commit-graph file with generation numbers computed, then
> there is a better way.
...instead of generating commits in topological order as you go.
> This patch introduces some necessary logic
> redirection when we are in this situation.
O.K., this should make main commit smaller. All right.
> In v2.18.0, the commit-graph file contains zero-valued bytes in the
> positions where the generation number is stored in v2.19.0 and later.
> Thus, we use generation_numbers_enabled() to check if the commit-graph
> is available and has non-zero generation numbers.
>
> When setting revs->limited only because revs->topo_order is true,
> only do so if generation numbers are not available. There is no
> reason to use the new logic as it will behave similarly when all
> generation numbers are INFINITY or ZERO.
O.K. we will be using new algorithm only when there actually are some
generation numbers.
> In prepare_revision_walk(), if we have revs->topo_order but not
> revs->limited, then we trigger the new logic. It breaks the logic
> into three pieces, to fit with the existing framework:
So if revs->limited is set (but not because revs->topo_order is set),
which means A..B queries, we will be still using the old algorithm.
All right, though I wonder if it could be improved in the future
(perhaps with the help of other graph labelling / indices than
generation numbers, maybe a positive-cut index).
Do you have an idea why there is no improvement with the new code in
this case?
> 1. init_topo_walk() fills a new struct topo_walk_info in the rev_info
> struct. We use the presence of this struct as a signal to use the
> new methods during our walk. In this patch, this method simply
> calls limit_list() and sort_in_topological_order(). In the future,
> this method will set up a new data structure to perform that logic
> in-line.
>
> 2. next_topo_commit() provides get_revision_1() with the next topo-
> ordered commit in the list. Currently, this simply pops the commit
> from revs->commits.
>
> 3. expand_topo_walk() provides get_revision_1() with a way to signal
> walking beyond the latest commit. Currently, this calls
> add_parents_to_list() exactly like the old logic.
So all three new functions should perform exactly like the old logic,
isn't it?
> While this commit presents method redirection for performing the
> exact same logic as before, it allows the next commit to focus only
> on the new logic.
All right, it's logical.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> revision.c | 42 ++++++++++++++++++++++++++++++++++++++----
> revision.h | 4 ++++
> 2 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/revision.c b/revision.c
> index e18bd530e4..2dcde8a8ac 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -25,6 +25,7 @@
> #include "worktree.h"
> #include "argv-array.h"
> #include "commit-reach.h"
> +#include "commit-graph.h"
>
> volatile show_early_output_fn_t show_early_output;
>
> @@ -2454,7 +2455,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
> if (revs->diffopt.objfind)
> revs->simplify_history = 0;
>
> - if (revs->topo_order)
> + if (revs->topo_order && !generation_numbers_enabled(the_repository))
> revs->limited = 1;
All right, with --topo-order and existing generation numbers don't force
the revs->limited code (i.e. explicit not wrapped use of limit_list()).
So with --topo-order and A..B, we have revs->limited set, with
--topo-order and no generation numbers we have revs->limited set.
>
> if (revs->prune_data.nr) {
> @@ -2892,6 +2893,33 @@ static int mark_uninteresting(const struct object_id *oid,
> return 0;
> }
>
> +struct topo_walk_info {};
Nice trick with using NULL-ness of the pointer to the currently empty
struct as a boolean flag denoting whether to use new generation number
using algorithm for topological sorting.
> +
> +static void init_topo_walk(struct rev_info *revs)
> +{
> + struct topo_walk_info *info;
I guess this helper variables is here for next revisions, as we could
have made without it...
> + revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
> + info = revs->topo_walk_info;
> + memset(info, 0, sizeof(struct topo_walk_info));
...by using
+ memset(revs->topo_walk_info, 0, sizeof(struct topo_walk_info));
> +
> + limit_list(revs);
> + sort_in_topological_order(&revs->commits, revs->sort_order);
This is not exactly identical to the old code, which has
if (limit_list(revs) < 0)
return -1;
if (revs->topo_order)
sort_in_topological_order(&revs->commits, revs->sort_order);
We know that init_topo_walk() would be invoked, as the name implies,
only when revs->topo_order is set, but do we know that limit_list()
would not return an error?
> +}
> +
> +static struct commit *next_topo_commit(struct rev_info *revs)
> +{
> + return pop_commit(&revs->commits);
> +}
All right, identical to the old code.
> +
> +static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
> +{
> + if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
> + if (!revs->ignore_missing_links)
> + die("Failed to traverse parents of commit %s",
> + oid_to_hex(&commit->object.oid));
> + }
> +}
All right, identical to the old code.
While at it, should this message be marked up for translation, or is it
something so low-level (and rare) that should be kept untranslated? But
this would be better left for separate commit series, to not entangle
this one with spurious changes.
> +
> int prepare_revision_walk(struct rev_info *revs)
> {
> int i;
> @@ -2928,11 +2956,13 @@ int prepare_revision_walk(struct rev_info *revs)
> commit_list_sort_by_date(&revs->commits);
> if (revs->no_walk)
> return 0;
> - if (revs->limited)
> + if (revs->limited) {
> if (limit_list(revs) < 0)
> return -1;
> - if (revs->topo_order)
> - sort_in_topological_order(&revs->commits, revs->sort_order);
> + if (revs->topo_order)
> + sort_in_topological_order(&revs->commits, revs->sort_order);
> + } else if (revs->topo_order)
> + init_topo_walk(revs);
Previously when revs->topo_order was set, Git called
sort_in_topological_order(), because revs->limited got always set to
truthy value if revs->topo_order was true.
Now running sort_in_topological_order() is done only if revs->limited is
set (because of A..B); if it is not, init_topo_walk() is called.
All right, identical to the old code, up to checking the return value of
limit_list(), see previous comments.
> if (revs->line_level_traverse)
> line_log_filter(revs);
> if (revs->simplify_merges)
> @@ -3257,6 +3287,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
>
> if (revs->reflog_info)
> commit = next_reflog_entry(revs->reflog_info);
> + else if (revs->topo_walk_info)
> + commit = next_topo_commit(revs);
> else
> commit = pop_commit(&revs->commits);
All right, identical to the old code.
> @@ -3278,6 +3310,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
>
> if (revs->reflog_info)
> try_to_simplify_commit(revs, commit);
> + else if (revs->topo_walk_info)
> + expand_topo_walk(revs, commit);
> else if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
> if (!revs->ignore_missing_links)
> die("Failed to traverse parents of commit %s",
All right, identical to the old code.
> diff --git a/revision.h b/revision.h
> index 2b30ac270d..fd4154ff75 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -56,6 +56,8 @@ struct rev_cmdline_info {
> #define REVISION_WALK_NO_WALK_SORTED 1
> #define REVISION_WALK_NO_WALK_UNSORTED 2
>
> +struct topo_walk_info;
> +
> struct rev_info {
> /* Starting list */
> struct commit_list *commits;
> @@ -245,6 +247,8 @@ struct rev_info {
> const char *break_bar;
>
> struct revision_sources *sources;
> +
> + struct topo_walk_info *topo_walk_info;
> };
>
> int ref_excluded(struct string_list *, const char *path);
next prev parent reply other threads:[~2018-10-21 15:55 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02 ` SZEDER Gábor
2018-09-19 19:31 ` Junio C Hamano
2018-09-19 19:38 ` Junio C Hamano
2018-09-20 21:18 ` Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 5:51 ` Ævar Arnfjörð Bjarmason
2018-09-18 6:05 ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47 ` Derrick Stolee
2018-09-21 17:39 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15 ` Derrick Stolee
2018-10-11 13:54 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58 ` Jeff King
2018-10-12 4:34 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06 ` Jeff King
2018-10-12 6:33 ` Junio C Hamano
2018-10-12 12:32 ` Derrick Stolee
2018-10-12 16:15 ` Johannes Sixt
2018-10-13 8:05 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57 ` Derrick Stolee
2018-10-06 16:56 ` Jakub Narebski
2018-10-11 15:35 ` Jeff King
2018-10-11 16:21 ` Derrick Stolee
2018-10-25 9:43 ` Jeff King
2018-10-25 13:00 ` Derrick Stolee
2018-10-11 22:32 ` Stefan Beller
2018-09-21 21:22 ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36 ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21 ` Jakub Narebski
2018-10-21 15:28 ` Derrick Stolee
2018-10-16 22:36 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55 ` Jakub Narebski [this message]
2018-10-22 1:12 ` Junio C Hamano
2018-10-22 1:51 ` Derrick Stolee
2018-10-22 1:55 ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25 8:28 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37 ` Jakub Narebski
2018-10-23 13:54 ` Derrick Stolee
2018-10-26 16:55 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48 ` Jakub Narebski
2018-10-21 12:57 ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01 5:21 ` Junio C Hamano
2018-11-01 13:49 ` Derrick Stolee
2018-11-01 23:54 ` Junio C Hamano
2018-11-01 13:46 ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48 ` SZEDER Gábor
2018-11-01 16:12 ` Derrick Stolee
2019-11-08 2:50 ` Mike Hommey
2019-11-11 1:07 ` Derrick Stolee
2019-11-18 23:04 ` SZEDER Gábor
2018-11-01 13:46 ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86h8hfguqc.fsf@gmail.com \
--to=jnareb@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).