Greetings everyone! I am working on implementing Generation Number v2. I have written an article about reachability queries, which I feel is necessary background for understanding the project. Here's the summary of article: > Reachability refers to the ability to get from one vertex to another > within a graph. > > Reachability queries are an interesting problem, improving performance > for many graph operations. Better and more sophisticated solutions are > being created as the size of working graphs keeps increasing. > > Reachability for the undirected graph can be found in linear > preprocessing and constant query time with disjoint set unions. The > answer isn't as evident for a directed graph because of differing > performance on positive and negative queries, nature and size of graph > and other factors. Topological Levels, Post Order DFS Intervals and > Contraction Hierachies are some of the building blocks for such > algorithms. In a later article, I will talk about the specifics of generation number for Git. In particular, how Git uses reachability queries, the need for Generation Number v2 i.e., _Correted Commit Date With Strictly Monotonic Offset_ and other interesting tidbits I come across. You can find the article here: https://abhishekkumar2718.github.io/programming/2020/05/20/reachability-queries.html I appreciate any suggestions or feedback. Thanks Abhishek

Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > Greetings everyone! > > I am working on implementing Generation Number v2. I have written an > article about reachability queries, which I feel is necessary background > for understanding the project. Thank you for your introductory work. It is always nice to have the problem to be solved well described. That said, I am not sure if all this theoretical introduction is needed to understand the reachability labeling that Git is using, and is the centerpoint of "Generation Number v2", namely the generation number or topological level. > Here's the summary of article: > >> Reachability refers to the ability to get from one vertex to another >> within a graph. >> >> Reachability queries are an interesting problem, improving performance >> for many graph operations. Better and more sophisticated solutions are >> being created as the size of working graphs keeps increasing. One caveat: the solution that works for the scale-free small-world graph like citation network might not work for the commit graph, and vice versa. Also, not all algorithms for static graph would work well for constantly growing history graph (for commit graph). But it is also what makes it interesting, in my opinion. >> >> Reachability for the undirected graph can be found in linear >> preprocessing and constant query time with disjoint set unions. The >> answer isn't as evident for a directed graph because of differing >> performance on positive and negative queries, nature and size of graph >> and other factors. Topological Levels, Post Order DFS Intervals and >> Contraction Hierarchies are some of the building blocks for such >> algorithms. All right. > > In a later article, I will talk about the specifics of generation number > for Git. In particular, how Git uses reachability queries, the need for > Generation Number v2 i.e., _Corrected Commit Date With Strictly Monotonic > Offset_ and other interesting tidbits I come across. Actually, _Corrected Commit Date With Strictly Monotonic Offset_ variant of generation number v2 is needed to have *backward compatibility*. It turned out that there was a bug in implementing versioning for serialized commit-graph format, namely that unsupported (newer) versions of file make Git crash, instead of just not using commit-graph file to speed up operations. But it seems that there is a workaround. It happens that if commit graph file is missing the CDAT chunk, then Git wouldn't use it to speed up operations, but would not hard fail. So if we put generation number v2 in a chunk with different name, e.g. CDA2, we would not need backward compatibility. Then simpler (and benchmarked) variant of _Corrected Date_ would be enough. There are advantages and disadvantages of each approach (assuming that the second one actually works as described). The first is more complex, but backward compatibile with respect to reading (using) commit-graph. The second is simpler, but old Git using new format would get performance regression. > You can find the article here: > > https://abhishekkumar2718.github.io/programming/2020/05/20/reachability-queries.html Regards, -- Jakub Narębski