git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Abhishek Kumar <abhishekkumar8222@gmail.com>
Cc: Christian Couder <christian.couder@gmail.com>,
	git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>
Subject: Re: [RFC] Possible idea for GSoC 2020
Date: Thu, 19 Mar 2020 13:50:00 +0100	[thread overview]
Message-ID: <86eeto5x1j.fsf@gmail.com> (raw)
In-Reply-To: <CAHk66fv-fRZdUFC978sKUT_b7VYi6S2f2vdTQ6iB-wcCznnBHg@mail.gmail.com> (Abhishek Kumar's message of "Tue, 17 Mar 2020 23:30:00 +0530")

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> Greetings Jakub
>
>> So perhaps we should expand "Commit graph labeling for speeding up git
>> commands" idea, splitting it into two possible ways of working in this
>> project: the more technical 'Generation number v2', and 'Interval labels
>> for commit graph' which is more of research project?  Which should be
>> put first, then?
>
> I would suggest working on generation number v2 first because:
> - We ship improved performance *sooner*.
> - I find it easier to shift from simpler to more complex systems.

It would be also more of an engineering project, than research one.

> On a personal note, I would be able to do a better job at working on generation
> number v2 than on interval labels. While reading through the papers mentioned
> was fun and full of a-ha moments, I find building things more
> fulfilling. I would be glad if either you or Derrick opt for mentoring it.

All right.  I have added 'Generation number v2' as possible way of
working on the "Commit graph labeling for speeding up git commands"
idea.

https://git.github.io/SoC-2020-Ideas/#commit-graph-labeling-for-speeding-up-git-commands


> You could read my proposal at the link below. It is very rough and I haven't
> proofread it yet. I will send out a more formal proposal once the direction
> of this project is decided.
>
> https://github.com/abhishekkumar2718/GSoC20/blob/master/graph_labelling.md
>
> **Too long, didn't read**

I will comment only on the TLDR version below.

> - Commit graphs are small to medium-sized (compared to problem sizes observed in
> graph-theory literature) sparse graphs. They have unusual properties compared
> to more conventional graphs that can be exploited for better performance.

Right.

> - Most of git's reachability queries are negative and using a negative-cut
> filter improves performance more than a positive-cut filter.

Not exactly.  Most time consuming git graph queries are those that
return a subgraph or otherwise need to walk the commit graph (like
topological sorting, or ahead/behind numbers).  Here in most cases
negative-cut filters, that reduce the number of commits walked, improve
performance more (with rare exceptions, like --ancestry-path query).

That is what I think, but it is not something I have proven...

> - Implementing and maintaining a two-dimensional reachability index is hard
> and does not offer justifiable performance gains.

I think it would be better to not mention graph reachability algorithms
at all, and just talk about generation number v2 as straight performance
improvement over generation number v1.

> - We plan to use corrected commit date as the generation number v2 because it is
> locally computable, immutable and can be incrementally updated.

Unfortunately because of an oopsie with respect to commit-graph file
format versioning, at least for the time being generation number v2
needs to be also backward compatibile with generation number v1.  See
for example:
  https://git.github.io/rev_news/2019/06/28/edition-52/#reviews
  https://speakerdeck.com/jnareb/graph-operations-in-git-and-how-to-make-them-faster?slide=74

>
> - If git ever considers a two-dimensional reachability index, either post order
> DFS, GRAIL or an index based on commit date would be good starting
> places to explore.
>
> I go in more details about GRAIL, FERRARI and PReaCH, explaining
> briefly how they work, their advantages, and disadvantages.

Again, I don't think that is necessary.

Best,
-- 
Jakub Narębski

  reply	other threads:[~2020-03-19 12:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-17 18:00 [RFC] Possible idea for GSoC 2020 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski [this message]
     [not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski
  -- strict thread matches above, loose matches on Subject: below --
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-13 17:30 Abhishek Kumar
2020-03-10 14:50 Jakub Narebski
2020-03-11 19:03 ` Junio C Hamano
2020-03-13 10:56   ` Jakub Narebski
2020-03-15 14:26     ` Jakub Narebski
2020-03-17 12:24       ` Kaartic Sivaraam
2020-03-17 12:39         ` Kaartic Sivaraam
2020-03-17 14:22         ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30   ` Jakub Narebski
2020-03-11 21:48     ` Christian Couder
2020-03-12 12:17       ` Jakub Narebski
2020-03-12 13:08         ` Jakub Narebski
2020-03-13 10:59           ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34   ` Jakub Narebski
2020-03-15 18:57     ` Philip Oakley
2020-03-15 21:14       ` Jakub Narebski
2020-03-16 14:47         ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17  3:13   ` Jakub Narebski
2020-03-17  7:24     ` Christian Couder
2020-03-17 11:49       ` Derrick Stolee
2020-03-17 14:18       ` Jakub Narebski
2020-03-17 17:04         ` Christian Couder
2020-03-18 13:55           ` Jakub Narebski
2020-03-18 15:25             ` Derrick Stolee
2020-03-19 12:52               ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86eeto5x1j.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).