git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Derrick Stolee <stolee@gmail.com>, git <git@vger.kernel.org>,
	Heba Waly <heba.waly@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Emily Shaffer <emilyshaffer@google.com>,
	Abhishek Kumar <abhishekkumar8222@gmail.com>
Subject: Re: [RFC] Possible idea for GSoC 2020
Date: Wed, 18 Mar 2020 14:55:54 +0100	[thread overview]
Message-ID: <86mu8d6a39.fsf@gmail.com> (raw)
In-Reply-To: <CAP8UFD1ihR1PtM2y24HTKa2woGXMVFq9MoVSj1cHVZCNWSH7EA@mail.gmail.com> (Christian Couder's message of "Tue, 17 Mar 2020 18:04:23 +0100")

Christian Couder <christian.couder@gmail.com> writes:
> On Tue, Mar 17, 2020 at 3:18 PM Jakub Narebski <jnareb@gmail.com> wrote:
>> Christian Couder <christian.couder@gmail.com> writes:
>>> On Tue, Mar 17, 2020 at 4:13 AM Jakub Narebski <jnareb@gmail.com> wrote:
[...]
>>>> As I wrote below, such positive-cut filter would be directly helpful in
>>>> performing the following commands:
>>>>
>>>>  - `git merge-base --is-ancestor`
>>>>  - `git branch --contains`
>>>>  - `git tag --contains`
>>>>  - `git branch --merged`
>>>>  - `git tag --merged`
>>>>
>>>> It would be also useful for tag autofollow in git-fetch; is is N-to-M
>>>> equivalent to 1-to-N / N-to-1 `--contains` queries.
>>>>
>>>> I am quite sure that positive-cut filter would make `--ancestry-path`
>>>> walk faster.
>>>>
>>>> I think, but I am not sure, that positive-cut filter can make parts of
>>>> topological sort and merge base algorithms at least a tiny bit faster.
>>>
>>> Is there an easy way to check that it would provide significant
>>> performance improvements at least in some cases? Can we ask the
>>> student to do that at the beginning of the GSoC?
>>
>> The "Reachability labels for version control graphs.ipynb" Jupyter
>> Notebook on Google Colaboratory was created to answer this question
>> (originally for the FELINE reachability index).  Among others it can
>> min-post interval labels and topological levels (generation numbers),
>> use them for reachability queries, and load Linux kernel
>> commit-graph. The exploration didn't get finished, but it would be not
>> that difficult, I think, to at least find the amount of false negatives
>> for min-post interval labeling for git.git or Linux kernel repo.
>>
>>   https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg
>>
>> As Jupyter Notebook, it is run in the web browser.  It can either use
>> local runtime, or run on Google Cloud runtime.  On the other hand it
>> requires at least some knowledge of Python...
>
> Ok, if that's a possible approach to the project, then please add it
> to the description.

Added as a new paragraph in the 'interval labels' subtask description
https://github.com/git/git.github.io/commit/ac526b18b97e53a431767ce147f27eaf6af0aa76

[...]
>>>>> My recommendation is that a GSoC student update the
>>>>> generation number to "v2" based on the definition you made in [1].
>>>>> That proposal is also more likely to be effective in Git because
>>>>> it makes use of extra heuristic information (commit date) to
>>>>> assist the types of algorithms we care about.
>>>>>
>>>>> In that case, the "difficult" part is moving the "generation"
>>>>> member of struct commit into a slab before making it a 64-bit
>>>>> value. (This is likely necessary for your plan, anyway.) Updating
>>>>> the generation number to v2 is relatively straight-forward after
>>>>> that, as someone can follow all places that reference or compute
>>>>> generation numbers and apply a diff
>>>>
>>>> Good idea!  Though I am not sure if it is not too late to add it to the
>>>> https://git.github.io/SoC-2020-Ideas/ as the self imposed deadline of
>>>> March 16 (where students can start submitting proposals to GSoC) has
>>>> just passed.  Christian, what do you think?
>>>
>>> Would that be a different project idea or part of your "Graph labeling
>>> for speeding up git commands" project idea?
>>>
>>> I am very reluctant to add new project ideas at that time. I don't
>>> think student will have time to properly research it and get it
>>> reviewed.
[...]

>> So perhaps we should expand "Commit graph labeling for speeding up git
>> commands" idea, splitting it into two possible ways of working in this
>> project: the more technical 'Generation number v2', and 'Interval labels
>> for commit graph' which is more of research project?
>
> Ok for that.
>
>>  Which should be
>> put first, then?
>
> Which ever you prefer. If you say that any part could be done
> separately, that doesn't matter much.

I have added 'Generation number v2' as one of alternative ways of
working on the more generic "Commit graph labeling for speeding up git
commands" idea -- as first task, because it fit better the narrative:
https://github.com/git/git.github.io/commit/a6d59820709417081c334a5120342987ff046f1a

Could you (or Stolee) check current proposal, so that it can be merged
in?  Thanks in advance.

  https://github.com/git/git.github.io/blob/soc-2020-idea-jnareb/SoC-2020-Ideas.md

Best,
-- 
Jakub Narębski

  reply	other threads:[~2020-03-18 13:56 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 14:50 [RFC] Possible idea for GSoC 2020 Jakub Narebski
2020-03-11 19:03 ` Junio C Hamano
2020-03-13 10:56   ` Jakub Narebski
2020-03-15 14:26     ` Jakub Narebski
2020-03-17 12:24       ` Kaartic Sivaraam
2020-03-17 12:39         ` Kaartic Sivaraam
2020-03-17 14:22         ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30   ` Jakub Narebski
2020-03-11 21:48     ` Christian Couder
2020-03-12 12:17       ` Jakub Narebski
2020-03-12 13:08         ` Jakub Narebski
2020-03-13 10:59           ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34   ` Jakub Narebski
2020-03-15 18:57     ` Philip Oakley
2020-03-15 21:14       ` Jakub Narebski
2020-03-16 14:47         ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17  3:13   ` Jakub Narebski
2020-03-17  7:24     ` Christian Couder
2020-03-17 11:49       ` Derrick Stolee
2020-03-17 14:18       ` Jakub Narebski
2020-03-17 17:04         ` Christian Couder
2020-03-18 13:55           ` Jakub Narebski [this message]
2020-03-18 15:25             ` Derrick Stolee
2020-03-19 12:52               ` Jakub Narebski
  -- strict thread matches above, loose matches on Subject: below --
2020-03-13 17:30 Abhishek Kumar
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-17 18:00 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski
     [not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86mu8d6a39.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=heba.waly@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).