From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net,
Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic
Date: Fri, 12 Oct 2018 08:32:43 -0400 [thread overview]
Message-ID: <0407d50d-aabd-e85e-3c48-f6b6678f9884@gmail.com> (raw)
In-Reply-To: <xmqqwoqnbrmw.fsf@gitster-ct.c.googlers.com>
On 10/12/2018 2:33 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> * revs->limited implies we run limit_list() to walk the entire
>> reachable set. There are some short-cuts here, such as if we
>> perform a range query like 'git rev-list COMPARE..HEAD' and we
>> can stop limit_list() when all queued commits are uninteresting.
>>
>> * revs->topo_order implies we run sort_in_topological_order(). See
>> the implementation of that method in commit.c. It implies that
>> the full set of commits to order is in the given commit_list.
>>
>> These two methods imply that a 'git rev-list --topo-order HEAD'
>> command must walk the entire reachable set of commits _twice_ before
>> returning a single result.
> With or without "--topo-order", running rev-list without any
> negative commit means we must dig down to the roots that can be
> reached from the positive commits we have.
If we use default order in 'git log', we don't walk all the way to the
root commits, and instead trust the commit-date. (This is different than
--date-order, which does guarantee parents after children.) In this
case, revs->limited is false.
> I am to sure if having to run the "sort" of order N counts as "walk
> the entire reachable set once" (in addition to the enumeration that
> must be done to prepare that N commits, performed in limit_list()).
sort_in_topological_order() does actually _two_ walks (the in-degree
computation plus the walk that peels commits of in-degree zero), but
those walks are cheaper because we've already parsed the commits in
limit_list().
>> 3. expand_topo_walk() provides get_revision_1() with a way to signal
>> walking beyond the latest commit. Currently, this calls
>> add_parents_to_list() exactly like the old logic.
> "latest"? We dig down the history from newer to older, so at some
> point we hit an old commit and need to find the parents to keep
> walking towards even older parts of the history. Did you mean
> "earliest" instead?
I mean "latest" in terms of the algorithm, so "the commit that was
returned by get_revision_1() most recently". This could use some
rewriting for clarity.
>>
>> @@ -2454,7 +2455,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
>> if (revs->diffopt.objfind)
>> revs->simplify_history = 0;
>>
>> - if (revs->topo_order)
>> + if (revs->topo_order && !generation_numbers_enabled(the_repository))
>> revs->limited = 1;
> Are we expecting that this is always a bool? Can there be new
> commits for which generation numbers are not computed and stored
> while all the old, stable and packed commits have generation
> numbers?
For this algorithm to work, we only care that _some_ commits have
generation numbers. We expect that if a commit-graph file exists with
generation numbers, then the majority of commits have generation
numbers. The commits that were added or fetched since the commit-graph
was written will have generation number INFINITY, but the topo-order
algorithm will still work and be efficient in those cases. (This is also
why we have the "half graph" case in test_three_modes.)
>> @@ -2892,6 +2893,33 @@ static int mark_uninteresting(const struct object_id *oid,
>> return 0;
>> }
>>
>> +struct topo_walk_info {};
>> +
>> +static void init_topo_walk(struct rev_info *revs)
>> +{
>> + struct topo_walk_info *info;
>> + revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
>> + info = revs->topo_walk_info;
>> + memset(info, 0, sizeof(struct topo_walk_info));
> There is no member in the struct at this point. Are we sure this is
> safe? Just being curious. I know xmalloc() gives us at least one
> byte and info won't be NULL. I just do not know offhand if we have
> a guarantee that memset() acts sensibly to fill the first 0 bytes.
This is a good question. It seems to work for me when I check out your
version of this commit (6c04ff30 "revision.c: begin refactoring
--topo-order logic") and run all tests.
>> + limit_list(revs);
>> + sort_in_topological_order(&revs->commits, revs->sort_order);
>> +}
>> +
>> +static struct commit *next_topo_commit(struct rev_info *revs)
>> +{
>> + return pop_commit(&revs->commits);
>> +}
>> +
>> +static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
>> +{
>> + if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
>> + if (!revs->ignore_missing_links)
>> + die("Failed to traverse parents of commit %s",
>> + oid_to_hex(&commit->object.oid));
>> + }
>> +}
>> +
>> int prepare_revision_walk(struct rev_info *revs)
>> {
>> int i;
>> @@ -2928,11 +2956,13 @@ int prepare_revision_walk(struct rev_info *revs)
>> commit_list_sort_by_date(&revs->commits);
>> if (revs->no_walk)
>> return 0;
>> - if (revs->limited)
>> + if (revs->limited) {
>> if (limit_list(revs) < 0)
>> return -1;
>> - if (revs->topo_order)
>> - sort_in_topological_order(&revs->commits, revs->sort_order);
>> + if (revs->topo_order)
>> + sort_in_topological_order(&revs->commits, revs->sort_order);
>> + } else if (revs->topo_order)
>> + init_topo_walk(revs);
>> if (revs->line_level_traverse)
>> line_log_filter(revs);
>> if (revs->simplify_merges)
> The diff is a bit hard to grok around here, but
>
> - when limited *and* topo_order, we do the sort here, as we know we
> already have called limit_list(), i.e. we behave identically as
> the code before this patch in that case.
>
> - when not limited but topo_order, then we do init_topo_walk();
> currently we do limit_list() and sort_in_topological_order(),
> which means we do the same as above.
>
> As long as limit_list() and sort_in_topological_order() does not
> look at revs->limited bit, this patch cannot cause any regression.
>
>> @@ -3257,6 +3287,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
>>
>> if (revs->reflog_info)
>> commit = next_reflog_entry(revs->reflog_info);
>> + else if (revs->topo_walk_info)
>> + commit = next_topo_commit(revs);
>> else
>> commit = pop_commit(&revs->commits);
> So this get_revision_1() always grabs the commit from next_topo_commit()
> when topo-order is in effect.
And specifically, when the conditions for our new topo-walk algorithm
are in effect. If the commit-graph doesn't exist, the old logic will
still go through for "git log --topo-order".
Thanks for the careful look!
-Stolee
next prev parent reply other threads:[~2018-10-12 12:32 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02 ` SZEDER Gábor
2018-09-19 19:31 ` Junio C Hamano
2018-09-19 19:38 ` Junio C Hamano
2018-09-20 21:18 ` Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 5:51 ` Ævar Arnfjörð Bjarmason
2018-09-18 6:05 ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47 ` Derrick Stolee
2018-09-21 17:39 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15 ` Derrick Stolee
2018-10-11 13:54 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58 ` Jeff King
2018-10-12 4:34 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06 ` Jeff King
2018-10-12 6:33 ` Junio C Hamano
2018-10-12 12:32 ` Derrick Stolee [this message]
2018-10-12 16:15 ` Johannes Sixt
2018-10-13 8:05 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57 ` Derrick Stolee
2018-10-06 16:56 ` Jakub Narebski
2018-10-11 15:35 ` Jeff King
2018-10-11 16:21 ` Derrick Stolee
2018-10-25 9:43 ` Jeff King
2018-10-25 13:00 ` Derrick Stolee
2018-10-11 22:32 ` Stefan Beller
2018-09-21 21:22 ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36 ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21 ` Jakub Narebski
2018-10-21 15:28 ` Derrick Stolee
2018-10-16 22:36 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55 ` Jakub Narebski
2018-10-22 1:12 ` Junio C Hamano
2018-10-22 1:51 ` Derrick Stolee
2018-10-22 1:55 ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25 8:28 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37 ` Jakub Narebski
2018-10-23 13:54 ` Derrick Stolee
2018-10-26 16:55 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48 ` Jakub Narebski
2018-10-21 12:57 ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01 5:21 ` Junio C Hamano
2018-11-01 13:49 ` Derrick Stolee
2018-11-01 23:54 ` Junio C Hamano
2018-11-01 13:46 ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48 ` SZEDER Gábor
2018-11-01 16:12 ` Derrick Stolee
2019-11-08 2:50 ` Mike Hommey
2019-11-11 1:07 ` Derrick Stolee
2019-11-18 23:04 ` SZEDER Gábor
2018-11-01 13:46 ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0407d50d-aabd-e85e-3c48-f6b6678f9884@gmail.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).