git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [GSoC] Blog post on reachability queries
@ 2020-05-20 16:37 Abhishek Kumar
  2020-05-21 14:16 ` jnareb
  0 siblings, 1 reply; 2+ messages in thread
From: Abhishek Kumar @ 2020-05-20 16:37 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

Greetings everyone!

I am working on implementing Generation Number v2. I have written an
article about reachability queries, which I feel is necessary background
for understanding the project.

Here's the summary of article:
> Reachability refers to the ability to get from one vertex to another
> within a graph.
>
> Reachability queries are an interesting problem, improving performance
> for many graph operations. Better and more sophisticated solutions are
> being created as the size of working graphs keeps increasing.
>
> Reachability for the undirected graph can be found in linear
> preprocessing and constant query time with disjoint set unions. The
> answer isn't as evident for a directed graph because of differing
> performance on positive and negative queries, nature and size of graph
> and other factors. Topological Levels, Post Order DFS Intervals and
> Contraction Hierachies are some of the building blocks for such
> algorithms.

In a later article, I will talk about the specifics of generation number
for Git. In particular, how Git uses reachability queries, the need for
Generation Number v2 i.e., _Correted Commit Date With Strictly Monotonic
Offset_ and other interesting tidbits I come across.

You can find the article here:

https://abhishekkumar2718.github.io/programming/2020/05/20/reachability-queries.html

I appreciate any suggestions or feedback.

Thanks
Abhishek

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [GSoC] Blog post on reachability queries
  2020-05-20 16:37 [GSoC] Blog post on reachability queries Abhishek Kumar
@ 2020-05-21 14:16 ` jnareb
  0 siblings, 0 replies; 2+ messages in thread
From: jnareb @ 2020-05-21 14:16 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, Derrick Stolee, Jakub Narębski

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> Greetings everyone!
>
> I am working on implementing Generation Number v2. I have written an
> article about reachability queries, which I feel is necessary background
> for understanding the project.

Thank you for your introductory work.  It is always nice to have the
problem to be solved well described.

That said, I am not sure if all this theoretical introduction is needed
to understand the reachability labeling that Git is using, and is the
centerpoint of "Generation Number v2", namely the generation number or
topological level.

> Here's the summary of article:
>
>> Reachability refers to the ability to get from one vertex to another
>> within a graph.
>>
>> Reachability queries are an interesting problem, improving performance
>> for many graph operations. Better and more sophisticated solutions are
>> being created as the size of working graphs keeps increasing.

One caveat: the solution that works for the scale-free small-world graph
like citation network might not work for the commit graph, and vice
versa.  Also, not all algorithms for static graph would work well for
constantly growing history graph (for commit graph).

But it is also what makes it interesting, in my opinion.

>>
>> Reachability for the undirected graph can be found in linear
>> preprocessing and constant query time with disjoint set unions. The
>> answer isn't as evident for a directed graph because of differing
>> performance on positive and negative queries, nature and size of graph
>> and other factors. Topological Levels, Post Order DFS Intervals and
>> Contraction Hierarchies are some of the building blocks for such
>> algorithms.

All right.

>
> In a later article, I will talk about the specifics of generation number
> for Git. In particular, how Git uses reachability queries, the need for
> Generation Number v2 i.e., _Corrected Commit Date With Strictly Monotonic
> Offset_ and other interesting tidbits I come across.

Actually, _Corrected Commit Date With Strictly Monotonic Offset_ variant
of generation number v2 is needed to have *backward compatibility*.  It
turned out that there was a bug in implementing versioning for
serialized commit-graph format, namely that unsupported (newer) versions
of file make Git crash, instead of just not using commit-graph file to
speed up operations.

But it seems that there is a workaround. It happens that if commit graph
file is missing the CDAT chunk, then Git wouldn't use it to speed up
operations, but would not hard fail.  So if we put generation number v2
in a chunk with different name, e.g. CDA2, we would not need backward
compatibility.  Then simpler (and benchmarked) variant of _Corrected
Date_ would be enough.

There are advantages and disadvantages of each approach (assuming that
the second one actually works as described).  The first is more complex,
but backward compatibile with respect to reading (using) commit-graph.
The second is simpler, but old Git using new format would get
performance regression.

> You can find the article here:
>
> https://abhishekkumar2718.github.io/programming/2020/05/20/reachability-queries.html

Regards,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-20 16:37 [GSoC] Blog post on reachability queries Abhishek Kumar
2020-05-21 14:16 ` jnareb

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git