git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v4 6/7] revision.c: generation-based topo-order algorithm
Date: Fri, 26 Oct 2018 18:55:22 +0200	[thread overview]
Message-ID: <86bm7g641x.fsf@gmail.com> (raw)
In-Reply-To: <6501930f-4097-1b81-6d0d-be54f050a5a4@gmail.com> (Derrick Stolee's message of "Tue, 23 Oct 2018 09:54:30 -0400")

Derrick Stolee <stolee@gmail.com> writes:
> On 10/22/2018 9:37 AM, Jakub Narebski wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
[...]
>> Sidenote: the new algorithm looks a bit like Unix pipeline, where each
>> step of pipeline does not output much more than next step needs /
>> requests.
>
> That's essentially the idea.

Some of the newer languages have built-in support for similar kind of
pipeline for connecting processes, be it channels in Go, supplies and
suppliers in Perl6.  I wonder if there exists some library implementing
this kind of construct in C.

That aside, I wonder if when there would be support for more
reachability indices than generation numbers, if it wouldn't be better
to pass a commit as a limiter (up to this commit), than specific indices
like current passing of generation number.  Just food for thoughs...

[...]
>>> In my local testing, I used the following Git commands on the
>>> Linux repository in three modes: HEAD~1 with no commit-graph,
>>> HEAD~1 with a commit-graph, and HEAD with a commit-graph. This
>>> allows comparing the benefits we get from parsing commits from
>>> the commit-graph and then again the benefits we get by
>>> restricting the set of commits we walk.
>>>
>>> Test: git rev-list --topo-order -100 HEAD
>>> HEAD~1, no commit-graph: 6.80 s
>>> HEAD~1, w/ commit-graph: 0.77 s
>>>    HEAD, w/ commit-graph: 0.02 s
>>>
>>> Test: git rev-list --topo-order -100 HEAD -- tools
>>> HEAD~1, no commit-graph: 9.63 s
>>> HEAD~1, w/ commit-graph: 6.06 s
>>>    HEAD, w/ commit-graph: 0.06 s
>>>
>>> This speedup is due to a few things. First, the new generation-
>>> number-enabled algorithm walks commits on order of the number of
>>> results output (subject to some branching structure expectations).
>>> Since we limit to 100 results, we are running a query similar to
>>> filling a single page of results. Second, when specifying a path,
>>> we must parse the root tree object for each commit we walk. The
>>> previous benefits from the commit-graph are entirely from reading
>>> the commit-graph instead of parsing commits. Since we need to
>>> parse trees for the same number of commits as before, we slow
>>> down significantly from the non-path-based query.
>>>
>>> For the test above, I specifically selected a path that is changed
>>> frequently, including by merge commits. A less-frequently-changed
>>> path (such as 'README') has similar end-to-end time since we need
>>> to walk the same number of commits (before determining we do not
>>> have 100 hits). However, get the benefit that the output is
>>> presented to the user as it is discovered, much the same as a
>>> normal 'git log' command (no '--topo-order'). This is an improved
>>> user experience, even if the command has the same runtime.
>>>
>> First, do I understand it correctly that in first case the gains from
>> new algorithms are so slim because with commit-graph file and no path
>> limiting we don't hit repository anyway; we walk less commits, but
>> reading commit data from commit-graph file is fast/
>
> If you mean 0.77s to 0.02s is "slim" then yes, it is because the
> commit-graph command already made a full walk of the commit history
> faster. (I'm only poking at this because the _relative_ improvement is
> significant, even if the command was already sub-second.)

First, you didn't provide us with percentages, i.e. relative improvement
(and I am lazy).  Second, 0.02s can be within the margin of error,
depending on how it is measured, and how stable this measurement is.

>> Second, I wonder if there is some easy way to perform automatic latency
>> tests, i.e. how fast does Git show the first page of output...
>
> I have talked with Jeff Hostetler about this, to see if we can have a
> "time to first page" traced with trace2, but we don't seem to have
> access to that information within Git. We would need to insert it into
> the pager. The "-100" is used instead.

Perhaps another solution to the problem of "first page of output"
latency tests could be feasible, namely create a helper test-pager-1p
"pager" program that would automatically quit after first page of
output; or perhaps even one that benchmarks each page of output
automatically.

There exists 'pv' (pipe viewer) program for pipes, so I think it would
be possible to do equivalent, but as a pager.

[...]
>>> +static inline void test_flag_and_insert(struct prio_queue *q, struct commit *c, int flag)
>>> +{
>>> +	if (c->object.flags & flag)
>>> +		return;
>>> +
>>> +	c->object.flags |= flag;
>>> +	prio_queue_put(q, c);
>>> +}
>>
>> This is an independent change, though I see that it is quite specific
>> (as opposed to quite generic prio_queue_peek() operation added earlier
>> in this series), so it does not make much sense as standalone change.
>>
>> It inserts commit into priority queue only if it didn't have flags set,
>> and sets the flag (so we won't add it to the queue again, not without
>> unsetting the flag), am I correct?
>
> Yes, this pattern of using a flag to avoid duplicate entries in the
> priority queue appears in multiple walks. It wasn't needed before. We
> call it four times in the code below.

>> I guess that we use test_flag_and_insert() instead of prio_queue_put()
>> to avoid duplicate entries in the queue.  I think the queue is initially
>> populated with the starting commits, but those need not to be
>> unreachable from each other, and walking down parents we can encounter
>> starting commit already in the queue.  Am I correct?
>
> We can also reach commits in multiple ways, so the initial conditions
> are not the only ways to insert duplicates.

Right.

[...]
>>> +	if (c->object.flags & UNINTERESTING)
>>> +		mark_parents_uninteresting(c);
>>> +
>>> +	for (p = c->parents; p; p = p->next)
>>> +		test_flag_and_insert(&info->explore_queue, p->item, TOPO_WALK_EXPLORED);
>>
>> Do we need to insert parents to the queue even if they were marked
>> UNINTERESTING?
>
> We need to propagate the UNINTERESTING flag to our parents. That
> propagation happens in process_parents().

I think I understand.  We need to propagate UNINTERESTING flag down the
chain, isn't it?

[...]
>>>     static void init_topo_walk(struct rev_info *revs)
>>>   {
>>>   	struct topo_walk_info *info;
>>> +	struct commit_list *list;
>>>   	revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
>>>   	info = revs->topo_walk_info;
>>>   	memset(info, 0, sizeof(struct topo_walk_info));
>>>   -	limit_list(revs);
>>> -	sort_in_topological_order(&revs->commits, revs->sort_order);
>>> +	init_indegree_slab(&info->indegree);
>>> +	memset(&info->explore_queue, '\0', sizeof(info->explore_queue));
>>> +	memset(&info->indegree_queue, '\0', sizeof(info->indegree_queue));
>>> +	memset(&info->topo_queue, '\0', sizeof(info->topo_queue));
>>
>> Why this memset uses '\0' as a filler value and not 0?  The queues are
>> not strings [and you use 0 in other places].

I think you missed answering about this issue.

[...]
>> Looks all right.
>
> Thanks for taking the time on this huge patch!

You are welcome.

-- 
Jakub Narębski

  reply	other threads:[~2018-10-26 16:55 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18  4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02     ` SZEDER Gábor
2018-09-19 19:31       ` Junio C Hamano
2018-09-19 19:38         ` Junio C Hamano
2018-09-20 21:18           ` Junio C Hamano
2018-09-18  4:08   ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  5:51     ` Ævar Arnfjörð Bjarmason
2018-09-18  6:05   ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47     ` Derrick Stolee
2018-09-21 17:39   ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15       ` Derrick Stolee
2018-10-11 13:54       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58       ` Jeff King
2018-10-12  4:34         ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06       ` Jeff King
2018-10-12  6:33       ` Junio C Hamano
2018-10-12 12:32         ` Derrick Stolee
2018-10-12 16:15         ` Johannes Sixt
2018-10-13  8:05           ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57       ` Derrick Stolee
2018-10-06 16:56         ` Jakub Narebski
2018-10-11 15:35       ` Jeff King
2018-10-11 16:21         ` Derrick Stolee
2018-10-25  9:43           ` Jeff King
2018-10-25 13:00             ` Derrick Stolee
2018-10-11 22:32       ` Stefan Beller
2018-09-21 21:22     ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21         ` Jakub Narebski
2018-10-21 15:28           ` Derrick Stolee
2018-10-16 22:36       ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55         ` Jakub Narebski
2018-10-22  1:12           ` Junio C Hamano
2018-10-22  1:51             ` Derrick Stolee
2018-10-22  1:55               ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25  8:28               ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56                 ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17         ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37         ` Jakub Narebski
2018-10-23 13:54           ` Derrick Stolee
2018-10-26 16:55             ` Jakub Narebski [this message]
2018-10-16 22:36       ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48         ` Jakub Narebski
2018-10-21 12:57       ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01  5:21       ` Junio C Hamano
2018-11-01 13:49         ` Derrick Stolee
2018-11-01 23:54           ` Junio C Hamano
2018-11-01 13:46       ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48           ` SZEDER Gábor
2018-11-01 16:12             ` Derrick Stolee
2019-11-08  2:50           ` Mike Hommey
2019-11-11  1:07             ` Derrick Stolee
2019-11-18 23:04               ` SZEDER Gábor
2018-11-01 13:46         ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86bm7g641x.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).