git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v3 7/7] revision.c: refactor basic topo-order logic
Date: Thu, 11 Oct 2018 12:21:44 -0400	[thread overview]
Message-ID: <1c0eb7d3-7f31-9377-d42f-4ac2f36ac26a@gmail.com> (raw)
In-Reply-To: <20181011153510.GF27312@sigill.intra.peff.net>

On 10/11/2018 11:35 AM, Jeff King wrote:
> On Fri, Sep 21, 2018 at 10:39:36AM -0700, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When running a command like 'git rev-list --topo-order HEAD',
>> Git performed the following steps:
>> [...]
>> In the new algorithm, these three steps correspond to three
>> different commit walks. We run these walks simultaneously,
> A minor nit, but this commit message doesn't mention the most basic
> thing up front: that its main purpose is to introduce a new algorithm
> for topo-order. ;)
>
> It's obvious in the context of reviewing the series, but somebody
> reading "git log" later may want a bit more. Perhaps:
>
>    revision.c: implement generation-based topo-order algorithm
>
> as a subject, and/or an introductory paragraph like:
>
>    The current --topo-order algorithm requires walking all commits we
>    are going to output up front, topo-sorting them, all before
>    outputting the first value. This patch introduces a new algorithm
>    which uses stored generation numbers to incrementally walk in
>    topo-order, outputting commits as we go.
>
> Other than that, I find this to be a wonderfully explanatory commit
> message. :)

Good idea. I'll make that change.

>
>> The walks are as follows:
>>
>> 1. EXPLORE: using the explore_queue priority queue (ordered by
>>     maximizing the generation number), parse each reachable
>>     commit until all commits in the queue have generation
>>     number strictly lower than needed. During this walk, update
>>     the UNINTERESTING flags as necessary.
> OK, this makes sense. If we know that everybody else in our queue is at
> generation X, then it is safe to output a commit at generation greater
> than X.
>
> I think this by itself would allow us to implement "show no parents
> before all of its children are shown", right? But --topo-order promises
> a bit more: "avoid showing commits no multiple lines of history
> intermixed".
>
> I guess also INFINITY generation numbers need more. For a real
> generation number, we know that "gen(A) == gen(B)" implies that there is
> no ancestry relationship between the two. But not so for INFINITY.

Yeah, to deal with INFINITY (and ZERO, but that won't happen if 
generation_numbers_enabled() returns true), we treat gen(A) == gen(B) as 
a "no information" state. So, to output a commit at generation X, we 
need to have our maximum generation number in the unexplored area to be 
at most X - 1. You'll see strict inequality when checking generations.


>> 2. INDEGREE: using the indegree_queue priority queue (ordered
>>     by maximizing the generation number), add one to the in-
>>     degree of each parent for each commit that is walked. Since
>>     we walk in order of decreasing generation number, we know
>>     that discovering an in-degree value of 0 means the value for
>>     that commit was not initialized, so should be initialized to
>>     two. (Recall that in-degree value "1" is what we use to say a
>>     commit is ready for output.) As we iterate the parents of a
>>     commit during this walk, ensure the EXPLORE walk has walked
>>     beyond their generation numbers.
> I wondered how this would work for INFINITY. We can't know the order of
> a bunch of INFINITY nodes at all, so we never know when their in-degree
> values are "done". But if I understand the EXPLORE walk, we'd basically
> walk all of INFINITY down to something with a real generation number. Is
> that right?
>
> But after that, I'm not totally clear on why we need this INDEGREE walk.

The INDEGREE walk is an important element for Kahn's algorithm. The 
final output order is dictated by peeling commits of "indegree zero" to 
ensure all children are output before their parents. (Note: since we use 
literal 0 to mean "uninitialized", we peel commits when the indegree 
slab has value 1.)

This walk replaces the indegree logic from sort_in_topological_order(). 
That method performs one walk that fills the indegree slab, then another 
walk that peels the commits with indegree 0 and inserts them into a list.

>> 3. TOPO: using the topo_queue priority queue (ordered based on
>>     the sort_order given, which could be commit-date, author-
>>     date, or typical topo-order which treats the queue as a LIFO
>>     stack), remove a commit from the queue and decrement the
>>     in-degree of each parent. If a parent has an in-degree of
>>     one, then we add it to the topo_queue. Before we decrement
>>     the in-degree, however, ensure the INDEGREE walk has walked
>>     beyond that generation number.
> OK, this makes sense to make --author-date-order, etc, work. Potentially
> those numbers might have no relationship at all with the graph
> structure, but we promise "no parent before its children are shown", so
> this is really just a tie-breaker after the topo-sort anyway. As long as
> steps 1 and 2 are correct and produce a complete set of commits for one
> "layer", this should be OK.
>
> I guess I'm not 100% convinced that we don't have a case where we
> haven't yet parsed or considered some commit that we know cannot have an
> ancestry relationship with commits we are outputting. But it may have an
> author-date-order relationship.
>
> (I'm not at all convinced that this _is_ a problem, and I suspect it
> isn't; I'm only suggesting I haven't fully grokked the proof).
The INDEGREE walk should not stop until it has explored at least to the 
point that all indegree 0 commits are exposed (relative to the current 
state of the walk).

At initialization, we walk from all starting positions until the maximum 
generation number in our queue is less than the minimum generation of a 
starting commit. The starting positions that have indegree 0 are then 
added to the topo_queue, and the sort order dictates which is the best. 
 From this point on, we can only create a new "indegree 0" commit by 
removing a commit from the topo_queue and decrementing the indegree of 
its parents. Those parents with indegree 0 are inserted into topo_queue 
and compared to all other indegree 0 commits. Thus, we will always 
explore enough to make the right choice relative our sort order.

>
>> ---
>>   object.h   |   4 +-
>>   revision.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>>   revision.h |   2 +
>>   3 files changed, 194 insertions(+), 8 deletions(-)
> I'll pause here on evaluating the actual code. It looks sane from a
> cursory read, but there's no point in digging further until I'm sure I
> fully understand the algorithm. I think that needs a little more brain
> power from me, and hopefully discussion around my comments above will
> help trigger that.
Thanks for reading! I understand that reading the code is useless 
without understanding the high-level concepts. I'm happy to iterate on 
this. If I can find a better way to explain the algorithm in the commit 
message to avoid the "huh?" moments above, then I will.

Thanks,
-Stolee

  reply	other threads:[~2018-10-11 16:21 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18  4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02     ` SZEDER Gábor
2018-09-19 19:31       ` Junio C Hamano
2018-09-19 19:38         ` Junio C Hamano
2018-09-20 21:18           ` Junio C Hamano
2018-09-18  4:08   ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18  4:08   ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18  5:51     ` Ævar Arnfjörð Bjarmason
2018-09-18  6:05   ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47     ` Derrick Stolee
2018-09-21 17:39   ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15       ` Derrick Stolee
2018-10-11 13:54       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58       ` Jeff King
2018-10-12  4:34         ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06       ` Jeff King
2018-10-12  6:33       ` Junio C Hamano
2018-10-12 12:32         ` Derrick Stolee
2018-10-12 16:15         ` Johannes Sixt
2018-10-13  8:05           ` Junio C Hamano
2018-09-21 17:39     ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21       ` Jeff King
2018-09-21 17:39     ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39     ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57       ` Derrick Stolee
2018-10-06 16:56         ` Jakub Narebski
2018-10-11 15:35       ` Jeff King
2018-10-11 16:21         ` Derrick Stolee [this message]
2018-10-25  9:43           ` Jeff King
2018-10-25 13:00             ` Derrick Stolee
2018-10-11 22:32       ` Stefan Beller
2018-09-21 21:22     ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36       ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21         ` Jakub Narebski
2018-10-21 15:28           ` Derrick Stolee
2018-10-16 22:36       ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55         ` Jakub Narebski
2018-10-22  1:12           ` Junio C Hamano
2018-10-22  1:51             ` Derrick Stolee
2018-10-22  1:55               ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25  8:28               ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56                 ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17         ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37         ` Jakub Narebski
2018-10-23 13:54           ` Derrick Stolee
2018-10-26 16:55             ` Jakub Narebski
2018-10-16 22:36       ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48         ` Jakub Narebski
2018-10-21 12:57       ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01  5:21       ` Junio C Hamano
2018-11-01 13:49         ` Derrick Stolee
2018-11-01 23:54           ` Junio C Hamano
2018-11-01 13:46       ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46         ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48           ` SZEDER Gábor
2018-11-01 16:12             ` Derrick Stolee
2019-11-08  2:50           ` Mike Hommey
2019-11-11  1:07             ` Derrick Stolee
2019-11-18 23:04               ` SZEDER Gábor
2018-11-01 13:46         ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c0eb7d3-7f31-9377-d42f-4ac2f36ac26a@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).