git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic
Date: Fri, 12 Oct 2018 08:32:43 -0400	[thread overview]
Message-ID: <0407d50d-aabd-e85e-3c48-f6b6678f9884@gmail.com> (raw)
In-Reply-To: <xmqqwoqnbrmw.fsf@gitster-ct.c.googlers.com>

On 10/12/2018 2:33 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> * revs->limited implies we run limit_list() to walk the entire
>>    reachable set. There are some short-cuts here, such as if we
>>    perform a range query like 'git rev-list COMPARE..HEAD' and we
>>    can stop limit_list() when all queued commits are uninteresting.
>>
>> * revs->topo_order implies we run sort_in_topological_order(). See
>>    the implementation of that method in commit.c. It implies that
>>    the full set of commits to order is in the given commit_list.
>>
>> These two methods imply that a 'git rev-list --topo-order HEAD'
>> command must walk the entire reachable set of commits _twice_ before
>> returning a single result.
> With or without "--topo-order", running rev-list without any
> negative commit means we must dig down to the roots that can be
> reached from the positive commits we have.
If we use default order in 'git log', we don't walk all the way to the 
root commits, and instead trust the commit-date. (This is different than 
--date-order, which does guarantee parents after children.) In this 
case, revs->limited is false.
> I am to sure if having to run the "sort" of order N counts as "walk
> the entire reachable set once" (in addition to the enumeration that
> must be done to prepare that N commits, performed in limit_list()).

sort_in_topological_order() does actually _two_ walks (the in-degree 
computation plus the walk that peels commits of in-degree zero), but 
those walks are cheaper because we've already parsed the commits in 
limit_list().
>> 3. expand_topo_walk() provides get_revision_1() with a way to signal
>>     walking beyond the latest commit. Currently, this calls
>>     add_parents_to_list() exactly like the old logic.
> "latest"?  We dig down the history from newer to older, so at some
> point we hit an old commit and need to find the parents to keep
> walking towards even older parts of the history.  Did you mean
> "earliest" instead?
I mean "latest" in terms of the algorithm, so "the commit that was 
returned by get_revision_1() most recently". This could use some 
rewriting for clarity.
>>   
>> @@ -2454,7 +2455,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
>>   	if (revs->diffopt.objfind)
>>   		revs->simplify_history = 0;
>>   
>> -	if (revs->topo_order)
>> +	if (revs->topo_order && !generation_numbers_enabled(the_repository))
>>   		revs->limited = 1;
> Are we expecting that this is always a bool?  Can there be new
> commits for which generation numbers are not computed and stored
> while all the old, stable and packed commits have generation
> numbers?

For this algorithm to work, we only care that _some_ commits have 
generation numbers. We expect that if a commit-graph file exists with 
generation numbers, then the majority of commits have generation 
numbers. The commits that were added or fetched since the commit-graph 
was written will have generation number INFINITY, but the topo-order 
algorithm will still work and be efficient in those cases. (This is also 
why we have the "half graph" case in test_three_modes.)

>> @@ -2892,6 +2893,33 @@ static int mark_uninteresting(const struct object_id *oid,
>>   	return 0;
>>   }
>>   
>> +struct topo_walk_info {};
>> +
>> +static void init_topo_walk(struct rev_info *revs)
>> +{
>> +	struct topo_walk_info *info;
>> +	revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
>> +	info = revs->topo_walk_info;
>> +	memset(info, 0, sizeof(struct topo_walk_info));
> There is no member in the struct at this point.  Are we sure this is
> safe?  Just being curious.  I know xmalloc() gives us at least one
> byte and info won't be NULL.  I just do not know offhand if we have
> a guarantee that memset() acts sensibly to fill the first 0 bytes.
This is a good question. It seems to work for me when I check out your 
version of this commit (6c04ff30 "revision.c: begin refactoring 
--topo-order logic") and run all tests.
>> +	limit_list(revs);
>> +	sort_in_topological_order(&revs->commits, revs->sort_order);
>> +}
>> +
>> +static struct commit *next_topo_commit(struct rev_info *revs)
>> +{
>> +	return pop_commit(&revs->commits);
>> +}
>> +
>> +static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
>> +{
>> +	if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
>> +		if (!revs->ignore_missing_links)
>> +			die("Failed to traverse parents of commit %s",
>> +			    oid_to_hex(&commit->object.oid));
>> +	}
>> +}
>> +
>>   int prepare_revision_walk(struct rev_info *revs)
>>   {
>>   	int i;
>> @@ -2928,11 +2956,13 @@ int prepare_revision_walk(struct rev_info *revs)
>>   		commit_list_sort_by_date(&revs->commits);
>>   	if (revs->no_walk)
>>   		return 0;
>> -	if (revs->limited)
>> +	if (revs->limited) {
>>   		if (limit_list(revs) < 0)
>>   			return -1;
>> -	if (revs->topo_order)
>> -		sort_in_topological_order(&revs->commits, revs->sort_order);
>> +		if (revs->topo_order)
>> +			sort_in_topological_order(&revs->commits, revs->sort_order);
>> +	} else if (revs->topo_order)
>> +		init_topo_walk(revs);
>>   	if (revs->line_level_traverse)
>>   		line_log_filter(revs);
>>   	if (revs->simplify_merges)
> The diff is a bit hard to grok around here, but
>
>   - when limited *and* topo_order, we do the sort here, as we know we
>     already have called limit_list(), i.e. we behave identically as
>     the code before this patch in that case.
>
>   - when not limited but topo_order, then we do init_topo_walk();
>     currently we do limit_list() and sort_in_topological_order(),
>     which means we do the same as above.
>
> As long as limit_list() and sort_in_topological_order() does not
> look at revs->limited bit, this patch cannot cause any regression.
>
>> @@ -3257,6 +3287,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
>>   
>>   		if (revs->reflog_info)
>>   			commit = next_reflog_entry(revs->reflog_info);
>> +		else if (revs->topo_walk_info)
>> +			commit = next_topo_commit(revs);
>>   		else
>>   			commit = pop_commit(&revs->commits);
> So this get_revision_1() always grabs the commit from next_topo_commit()
> when topo-order is in effect.
And specifically, when the conditions for our new topo-walk algorithm 
are in effect. If the commit-graph doesn't exist, the old logic will 
still go through for "git log --topo-order".

Thanks for the careful look!
-Stolee

  reply	other threads:[~2018-10-12 12:32 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18  4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02     ` SZEDER Gábor
2018-09-19 19:31       ` Junio C Hamano
2018-09-19 19:38         ` Junio C Hamano
2018-09-20 21:18           ` Junio C Hamano
2018-09-18  4:08   ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  5:51     ` Ævar Arnfjörð Bjarmason
2018-09-18  6:05   ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47     ` Derrick Stolee
2018-09-21 17:39   ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15       ` Derrick Stolee
2018-10-11 13:54       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58       ` Jeff King
2018-10-12  4:34         ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06       ` Jeff King
2018-10-12  6:33       ` Junio C Hamano
2018-10-12 12:32         ` Derrick Stolee [this message]
2018-10-12 16:15         ` Johannes Sixt
2018-10-13  8:05           ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57       ` Derrick Stolee
2018-10-06 16:56         ` Jakub Narebski
2018-10-11 15:35       ` Jeff King
2018-10-11 16:21         ` Derrick Stolee
2018-10-25  9:43           ` Jeff King
2018-10-25 13:00             ` Derrick Stolee
2018-10-11 22:32       ` Stefan Beller
2018-09-21 21:22     ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21         ` Jakub Narebski
2018-10-21 15:28           ` Derrick Stolee
2018-10-16 22:36       ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55         ` Jakub Narebski
2018-10-22  1:12           ` Junio C Hamano
2018-10-22  1:51             ` Derrick Stolee
2018-10-22  1:55               ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25  8:28               ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56                 ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17         ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37         ` Jakub Narebski
2018-10-23 13:54           ` Derrick Stolee
2018-10-26 16:55             ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48         ` Jakub Narebski
2018-10-21 12:57       ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01  5:21       ` Junio C Hamano
2018-11-01 13:49         ` Derrick Stolee
2018-11-01 23:54           ` Junio C Hamano
2018-11-01 13:46       ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48           ` SZEDER Gábor
2018-11-01 16:12             ` Derrick Stolee
2019-11-08  2:50           ` Mike Hommey
2019-11-11  1:07             ` Derrick Stolee
2019-11-18 23:04               ` SZEDER Gábor
2018-11-01 13:46         ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0407d50d-aabd-e85e-3c48-f6b6678f9884@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).