git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: Christian Couder <christian.couder@gmail.com>
Subject: [RFC] Possible idea for GSoC 2020
Date: Tue, 10 Mar 2020 15:50:24 +0100	[thread overview]
Message-ID: <86mu8o8dsf.fsf@gmail.com> (raw)

Hello,

Here below is a possible proposal for a more difficult Google Summer of
Code 2020 project.

A few questions:
- is it too late to propose a new project idea for GSoC 2020?
- is it too difficult of a project for GSoC?

Best,

  Jakub Narębski

--------------------------------------------------

### Graph labelling for speeding up git commands

 - Language: C
 - Difficulty: hard / difficult
 - Possible mentors: Jakub Narębski

Git uses various clever methods for making operations on very large
repositories faster, from bitmap indices for git-fetch[1], to generation
numbers (also known as topological levels) in the commit-graph file for
commit graph traversal operations like `git log --graph`[2].

One possible improvement that can make Git even faster is using min-post
intervals labelling.  The basis of this labelling is post-visit order of
a depth-first search traversal tree of a commit graph, let's call it
'post(v)'.

If for each commit 'v' we would compute and store in the commit-graph
file two numbers: 'post(v)' and the minimum of 'post(u)' for all commits
reachable from 'v', let's call the latter 'min_graph(v)', then the
following condition is true:

  if 'v' can reach 'u', then min_graph(v) <= post(u) <= post(v)

If for each commit 'v' we would compute and store in the commit-graph
file two numbers: 'post(v)' and the minimum of 'post(u)' for commits
that were visited during the part of depth-first search that started
from 'v' (which is the minimum of post-order number for subtree of a
spanning tree that starts at 'v').  Let's call the later 'min_tree(v)'.
Then the following condition is true:

  if min_tree(v) <= post(u) <= post(v), then 'v' can reach 'u'

The task would be to implement computing such labelling (or a more
involved variant of it[3][4]), storing it in commit-graph file, and
using it for speeding up git commands (starting from a single chosen
command) such as:

 - git merge-base --is-ancestor A B
 - git branch --contains A
 - git tag --contains A
 - git branch --merged A
 - git tag --merged A
 - git merge-base --all A B
 - git log --topo-sort

References:

1. <http://githubengineering.com/counting-objects/>
2. <https://devblogs.microsoft.com/devops/supercharging-the-git-commit-graph-iii-generations/>
3. <https://arxiv.org/abs/1404.4465>
4. <https://github.com/steps/Ferrari>

See also discussion in:

<https://public-inbox.org/git/86tvl0zhos.fsf@gmail.com/t/>

             reply	other threads:[~2020-03-10 14:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 14:50 Jakub Narebski [this message]
2020-03-11 19:03 ` [RFC] Possible idea for GSoC 2020 Junio C Hamano
2020-03-13 10:56   ` Jakub Narebski
2020-03-15 14:26     ` Jakub Narebski
2020-03-17 12:24       ` Kaartic Sivaraam
2020-03-17 12:39         ` Kaartic Sivaraam
2020-03-17 14:22         ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30   ` Jakub Narebski
2020-03-11 21:48     ` Christian Couder
2020-03-12 12:17       ` Jakub Narebski
2020-03-12 13:08         ` Jakub Narebski
2020-03-13 10:59           ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34   ` Jakub Narebski
2020-03-15 18:57     ` Philip Oakley
2020-03-15 21:14       ` Jakub Narebski
2020-03-16 14:47         ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17  3:13   ` Jakub Narebski
2020-03-17  7:24     ` Christian Couder
2020-03-17 11:49       ` Derrick Stolee
2020-03-17 14:18       ` Jakub Narebski
2020-03-17 17:04         ` Christian Couder
2020-03-18 13:55           ` Jakub Narebski
2020-03-18 15:25             ` Derrick Stolee
2020-03-19 12:52               ` Jakub Narebski
  -- strict thread matches above, loose matches on Subject: below --
2020-03-13 17:30 Abhishek Kumar
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-17 18:00 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski
     [not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86mu8o8dsf.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).