From: Jakub Narebski <jnareb@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Derrick Stolee <stolee@gmail.com>, git <git@vger.kernel.org>,
Heba Waly <heba.waly@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Jonathan Tan <jonathantanmy@google.com>,
Emily Shaffer <emilyshaffer@google.com>,
Abhishek Kumar <abhishekkumar8222@gmail.com>
Subject: Re: [RFC] Possible idea for GSoC 2020
Date: Wed, 18 Mar 2020 14:55:54 +0100 [thread overview]
Message-ID: <86mu8d6a39.fsf@gmail.com> (raw)
In-Reply-To: <CAP8UFD1ihR1PtM2y24HTKa2woGXMVFq9MoVSj1cHVZCNWSH7EA@mail.gmail.com> (Christian Couder's message of "Tue, 17 Mar 2020 18:04:23 +0100")
Christian Couder <christian.couder@gmail.com> writes:
> On Tue, Mar 17, 2020 at 3:18 PM Jakub Narebski <jnareb@gmail.com> wrote:
>> Christian Couder <christian.couder@gmail.com> writes:
>>> On Tue, Mar 17, 2020 at 4:13 AM Jakub Narebski <jnareb@gmail.com> wrote:
[...]
>>>> As I wrote below, such positive-cut filter would be directly helpful in
>>>> performing the following commands:
>>>>
>>>> - `git merge-base --is-ancestor`
>>>> - `git branch --contains`
>>>> - `git tag --contains`
>>>> - `git branch --merged`
>>>> - `git tag --merged`
>>>>
>>>> It would be also useful for tag autofollow in git-fetch; is is N-to-M
>>>> equivalent to 1-to-N / N-to-1 `--contains` queries.
>>>>
>>>> I am quite sure that positive-cut filter would make `--ancestry-path`
>>>> walk faster.
>>>>
>>>> I think, but I am not sure, that positive-cut filter can make parts of
>>>> topological sort and merge base algorithms at least a tiny bit faster.
>>>
>>> Is there an easy way to check that it would provide significant
>>> performance improvements at least in some cases? Can we ask the
>>> student to do that at the beginning of the GSoC?
>>
>> The "Reachability labels for version control graphs.ipynb" Jupyter
>> Notebook on Google Colaboratory was created to answer this question
>> (originally for the FELINE reachability index). Among others it can
>> min-post interval labels and topological levels (generation numbers),
>> use them for reachability queries, and load Linux kernel
>> commit-graph. The exploration didn't get finished, but it would be not
>> that difficult, I think, to at least find the amount of false negatives
>> for min-post interval labeling for git.git or Linux kernel repo.
>>
>> https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg
>>
>> As Jupyter Notebook, it is run in the web browser. It can either use
>> local runtime, or run on Google Cloud runtime. On the other hand it
>> requires at least some knowledge of Python...
>
> Ok, if that's a possible approach to the project, then please add it
> to the description.
Added as a new paragraph in the 'interval labels' subtask description
https://github.com/git/git.github.io/commit/ac526b18b97e53a431767ce147f27eaf6af0aa76
[...]
>>>>> My recommendation is that a GSoC student update the
>>>>> generation number to "v2" based on the definition you made in [1].
>>>>> That proposal is also more likely to be effective in Git because
>>>>> it makes use of extra heuristic information (commit date) to
>>>>> assist the types of algorithms we care about.
>>>>>
>>>>> In that case, the "difficult" part is moving the "generation"
>>>>> member of struct commit into a slab before making it a 64-bit
>>>>> value. (This is likely necessary for your plan, anyway.) Updating
>>>>> the generation number to v2 is relatively straight-forward after
>>>>> that, as someone can follow all places that reference or compute
>>>>> generation numbers and apply a diff
>>>>
>>>> Good idea! Though I am not sure if it is not too late to add it to the
>>>> https://git.github.io/SoC-2020-Ideas/ as the self imposed deadline of
>>>> March 16 (where students can start submitting proposals to GSoC) has
>>>> just passed. Christian, what do you think?
>>>
>>> Would that be a different project idea or part of your "Graph labeling
>>> for speeding up git commands" project idea?
>>>
>>> I am very reluctant to add new project ideas at that time. I don't
>>> think student will have time to properly research it and get it
>>> reviewed.
[...]
>> So perhaps we should expand "Commit graph labeling for speeding up git
>> commands" idea, splitting it into two possible ways of working in this
>> project: the more technical 'Generation number v2', and 'Interval labels
>> for commit graph' which is more of research project?
>
> Ok for that.
>
>> Which should be
>> put first, then?
>
> Which ever you prefer. If you say that any part could be done
> separately, that doesn't matter much.
I have added 'Generation number v2' as one of alternative ways of
working on the more generic "Commit graph labeling for speeding up git
commands" idea -- as first task, because it fit better the narrative:
https://github.com/git/git.github.io/commit/a6d59820709417081c334a5120342987ff046f1a
Could you (or Stolee) check current proposal, so that it can be merged
in? Thanks in advance.
https://github.com/git/git.github.io/blob/soc-2020-idea-jnareb/SoC-2020-Ideas.md
Best,
--
Jakub Narębski
next prev parent reply other threads:[~2020-03-18 13:56 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-10 14:50 [RFC] Possible idea for GSoC 2020 Jakub Narebski
2020-03-11 19:03 ` Junio C Hamano
2020-03-13 10:56 ` Jakub Narebski
2020-03-15 14:26 ` Jakub Narebski
2020-03-17 12:24 ` Kaartic Sivaraam
2020-03-17 12:39 ` Kaartic Sivaraam
2020-03-17 14:22 ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30 ` Jakub Narebski
2020-03-11 21:48 ` Christian Couder
2020-03-12 12:17 ` Jakub Narebski
2020-03-12 13:08 ` Jakub Narebski
2020-03-13 10:59 ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34 ` Jakub Narebski
2020-03-15 18:57 ` Philip Oakley
2020-03-15 21:14 ` Jakub Narebski
2020-03-16 14:47 ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17 3:13 ` Jakub Narebski
2020-03-17 7:24 ` Christian Couder
2020-03-17 11:49 ` Derrick Stolee
2020-03-17 14:18 ` Jakub Narebski
2020-03-17 17:04 ` Christian Couder
2020-03-18 13:55 ` Jakub Narebski [this message]
2020-03-18 15:25 ` Derrick Stolee
2020-03-19 12:52 ` Jakub Narebski
-- strict thread matches above, loose matches on Subject: below --
2020-03-13 17:30 Abhishek Kumar
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-17 18:00 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski
[not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86mu8d6a39.fsf@gmail.com \
--to=jnareb@gmail.com \
--cc=abhishekkumar8222@gmail.com \
--cc=christian.couder@gmail.com \
--cc=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=heba.waly@gmail.com \
--cc=jonathantanmy@google.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).